搜索结果：cheating

共找到 20 条结果

高级筛选 ▾

Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

arXiv2026-06-05作者：Thanawat Lodkaew, Johannes Ackermann, Soichiro Nishimori

A growing failure mode in agent evaluation and training is that models can achieve high evaluation scores by exploiting shortcuts instead of solving the intended task, producing deceptive performance. This makes evaluation scores unreliable as measures of true task-solving ability. We propose CapCode, a framework for constructing coding datasets with randomized tests whose best achievable non-cheating performance is deliberately capped below one. This capped-performance design gives evaluation scores a clearer interpretation: scores substantially above the cap are implausible and therefore provide evidence of cheating. To prevent cheating, we propose CapReward, a reward design based on the CapCode principle to discourage optimization beyond the cap. Experiments across multiple datasets show that CapCode detects cheating while preserving performance ranking of models, and CapReward reduces cheating behavior, yielding models that better follow the intended task specification.

搜索结果：cheating

Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

Does joint liability reduce cheating in contests with agency problems? Theory and experimental evidence

Cops against a cheating robber

A Two-Stage, Object-Centric Deep Learning Framework for Robust Exam Cheating Detection

How Much Can a Few Engine Moves Help? Quantifying Limited Cheating in Chess

Cheating in Multiplayer Online Games: a Dataset

Shoot the Honey, Cloak the Player: Towards Zero-Runtime-Overhead Proactive Defense and Detection for Visual Game Cheating

LLM Use, Cheating, and Academic Integrity in Software Engineering Education

Balancing The Perception of Cheating Detection, Privacy and Fairness: A Mixed-Methods Study of Visual Data Obfuscation in Remote Proctoring

Who ruins the game?: unveiling cheating players in the "Battlefield" game

On Perception of Prevalence of Cheating and Usage of Generative AI

Synopticon: Consensus-Based Cheating Detection System for Competitive Games

Multiple Instance Learning for Cheating Detection and Localization in Online Examinations

Cheating in quantum Rabin oblivious transfer using delayed measurements

Human-in-the-Loop AI for Cheating Ring Detection

Detecting AI-Assisted Cheating in Online Exams through Behavior Analytics

A Systematic Review of Technical Defenses Against Software-Based Cheating in Online Multiplayer Games

VIC: Evasive Video Game Cheating via Virtual Machine Introspection

Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates

A Bayesian framework for analyzing alleged cheating in sports through hidden codes, with applications to bridge and baseball

搜索结果：cheating

Do Coding Agents Deceive Us? Detecting and Preventing Cheating via Capped Evaluation with Randomized Tests

Does joint liability reduce cheating in contests with agency problems? Theory and experimental evidence

Cops against a cheating robber

A Two-Stage, Object-Centric Deep Learning Framework for Robust Exam Cheating Detection

How Much Can a Few Engine Moves Help? Quantifying Limited Cheating in Chess

Cheating in Multiplayer Online Games: a Dataset

Shoot the Honey, Cloak the Player: Towards Zero-Runtime-Overhead Proactive Defense and Detection for Visual Game Cheating

LLM Use, Cheating, and Academic Integrity in Software Engineering Education

Balancing The Perception of Cheating Detection, Privacy and Fairness: A Mixed-Methods Study of Visual Data Obfuscation in Remote Proctoring

Who ruins the game?: unveiling cheating players in the "Battlefield" game

On Perception of Prevalence of Cheating and Usage of Generative AI

Synopticon: Consensus-Based Cheating Detection System for Competitive Games

Multiple Instance Learning for Cheating Detection and Localization in Online Examinations

Cheating in quantum Rabin oblivious transfer using delayed measurements

Human-in-the-Loop AI for Cheating Ring Detection

Detecting AI-Assisted Cheating in Online Exams through Behavior Analytics

A Systematic Review of Technical Defenses Against Software-Based Cheating in Online Multiplayer Games

VIC: Evasive Video Game Cheating via Virtual Machine Introspection

Cheating Automatic LLM Benchmarks: Null Models Achieve High Win Rates

A Bayesian framework for analyzing alleged cheating in sports through hidden codes, with applications to bridge and baseball