搜索 — ResearchTracker

The rapid advancement of large language models (LLMs) has enabled the generation of coherent essays, making AI-assisted writing increasingly common in educational and professional settings. Using large-scale empirical data, we examine and benchmark the characteristics and quality of essays generated by popular LLMs and discuss their implications for two key components of writing assessments: automated scoring and academic integrity. Our findings highlight limitations in existing automated scoring systems, such as e-rater, when applied to essays generated or heavily influenced by AI, and identify areas for improvement, including the development of new features to capture deeper thinking and recalibrating feature weights. Despite growing concerns that the increasing variety of LLMs may undermine the feasibility of detecting AI-generated essays, our results show that detectors trained on essays generated from one model can often identify texts from others with high accuracy, suggesting that effective detection could remain manageable in practice.

Detecting AI-Generated Essays in Writing Assessment: Responsible Use and Generalizability Across LLMs

arXiv2026-03-02作者：Jiangang Hao

Writing is a foundational literacy skill that underpins effective communication, fosters critical thinking, facilitates learning across disciplines, and enables individuals to organize and articulate complex ideas. Consequently, writing assessment plays a vital role in evaluating language proficiency, communicative effectiveness, and analytical reasoning. The rapid advancement of large language models (LLMs) has made it increasingly easy to generate coherent, high-quality essays, raising significant concerns about the authenticity of student-submitted work. This chapter first provides an overview of the current landscape of detectors for AI-generated and AI-assisted essays, along with guidelines for their responsible use. It then presents empirical analyses to evaluate how well detectors trained on essays from one LLM generalize to identifying essays produced by other LLMs, based on essays generated in response to public GRE writing prompts. These findings provide guidance for developing and retraining detectors for practical applications.

搜索结果：Essays in biochemistry

AI-generated Essays: Characteristics and Implications on Automated Scoring and Academic Integrity

Detecting AI-Generated Essays in Writing Assessment: Responsible Use and Generalizability Across LLMs

Can AI grade your essays? A comparative analysis of large language models and teacher ratings in multidimensional essay scoring

Can Large Language Models Differentiate Harmful from Argumentative Essays? Steps Toward Ethical Essay Scoring

Towards Prompt Generalization: Grammar-aware Cross-Prompt Automated Essay Scoring

Can generative AI figure out figurative language? The influence of idioms on essay scoring by ChatGPT, Gemini, and Deepseek

Calibrating Generative AI to Produce Realistic Essays for Data Augmentation

Enhancing Essay Cohesion Assessment: A Novel Item Response Theory Approach

Improve LLM-based Automatic Essay Scoring with Linguistic Features

Essays on Responsible and Sustainable Finance

ArguGPT: evaluating, understanding and identifying argumentative essays generated by GPT models

Specialists or Generalists? Multi-Agent and Single-Agent LLMs for Essay Grading

Rationale Behind Essay Scores: Enhancing S-LLM's Multi-Trait Essay Scoring with Rationale Generated by LLMs

LLMs Do Not Grade Essays Like Humans

Graded Relevance Scoring of Written Essays with Dense Retrieval

Essays on Eclipses, Transits and Occultations as Teaching Tools in the Introductory Astronomy College Course

A School Student Essay Corpus for Analyzing Interactions of Argumentative Structure and Quality

Use of Interactive Simulations in Fundamentals of Biochemistry, a LibreText Online Educational Resource, to Promote Understanding of Dynamic Reactions

Automated essay scoring in Arabic: a dataset and analysis of a BERT-based system

Activations as Features: Probing LLMs for Generalizable Essay Scoring Representations