搜索 — ResearchTracker

Writing is a foundational literacy skill that underpins effective communication, fosters critical thinking, facilitates learning across disciplines, and enables individuals to organize and articulate complex ideas. Consequently, writing assessment plays a vital role in evaluating language proficiency, communicative effectiveness, and analytical reasoning. The rapid advancement of large language models (LLMs) has made it increasingly easy to generate coherent, high-quality essays, raising significant concerns about the authenticity of student-submitted work. This chapter first provides an overview of the current landscape of detectors for AI-generated and AI-assisted essays, along with guidelines for their responsible use. It then presents empirical analyses to evaluate how well detectors trained on essays from one LLM generalize to identifying essays produced by other LLMs, based on essays generated in response to public GRE writing prompts. These findings provide guidance for developing and retraining detectors for practical applications.

Can Large Language Models Differentiate Harmful from Argumentative Essays? Steps Toward Ethical Essay Scoring

arXiv2026-01-09作者：Hongjin Kim, Jeonghyun Kang, Harksoo Kim

This study addresses critical gaps in Automated Essay Scoring (AES) systems and Large Language Models (LLMs) with regard to their ability to effectively identify and score harmful essays. Despite advancements in AES technology, current models often overlook ethically and morally problematic elements within essays, erroneously assigning high scores to essays that may propagate harmful opinions. In this study, we introduce the Harmful Essay Detection (HED) benchmark, which includes essays integrating sensitive topics such as racism and gender bias, to test the efficacy of various LLMs in recognizing and scoring harmful content. Our findings reveal that: (1) LLMs require further enhancement to accurately distinguish between harmful and argumentative essays, and (2) both current AES models and LLMs fail to consider the ethical dimensions of content during scoring. The study underscores the need for developing more robust AES systems that are sensitive to the ethical implications of the content they are scoring.

搜索结果：Essays

Detecting AI-Generated Essays in Writing Assessment: Responsible Use and Generalizability Across LLMs

Can Large Language Models Differentiate Harmful from Argumentative Essays? Steps Toward Ethical Essay Scoring

Calibrating Generative AI to Produce Realistic Essays for Data Augmentation

LLMs Do Not Grade Essays Like Humans

Poor Alignment and Steerability of Large Language Models: Evidence from College Admission Essays

AI-generated Essays: Characteristics and Implications on Automated Scoring and Academic Integrity

ArguGPT: evaluating, understanding and identifying argumentative essays generated by GPT models

Graded Relevance Scoring of Written Essays with Dense Retrieval

Exploration of Summarization by Generative Language Models for Automated Scoring of Long Essays

Evaluating AI and Human Authorship Quality in Academic Writing through Physics Essays

AI, write an essay for me: A large-scale comparison of human-written versus ChatGPT-generated essays

Essays on Eclipses, Transits and Occultations as Teaching Tools in the Introductory Astronomy College Course

Can AI grade your essays? A comparative analysis of large language models and teacher ratings in multidimensional essay scoring

Essays on Responsible and Sustainable Finance

Using Active Learning Methods to Strategically Select Essays for Automated Scoring

Investigating Stylistic Profiles for the Task of Empathy Classification in Medical Narrative Essays

Can generative AI figure out figurative language? The influence of idioms on essay scoring by ChatGPT, Gemini, and Deepseek

Specialists or Generalists? Multi-Agent and Single-Agent LLMs for Essay Grading

eRevise+RF: A Writing Evaluation System for Assessing Student Essay Revisions and Providing Formative Feedback

Enhancing Essay Cohesion Assessment: A Novel Item Response Theory Approach