搜索 — ResearchTracker

Large language models can follow complex instructions in a single turn, yet over long multi-turn interactions they often lose the thread of instructions, persona, and rules. This degradation has been measured behaviorally but not mechanistically explained. We propose a channel-transition account: goal-defining tokens become less accessible through attention, while goal-related information may persist in residual representations. We introduce the Goal Accessibility Ratio (GAR), measuring attention from generated tokens to task-defining goal tokens, and combine it with sliding-window ablations and residual-stream probes. When attention to instructions closes, what survives reveals architecture. Across architectures, the transition yields qualitatively distinct failure modes: some models preserve goal-conditioned behavior at vanishing attention, others fail despite decodable residual goal information, and the layer at which this encoding emerges varies from 2 to 27. A within-model causal ablation that force-closes the attention channel in Mistral collapses recall from near-perfect to 11% on a 20-fact retention task and raises persona-constraint violations above an adversarial-pressure

Knowledge Distillation Must Account for What It Loses

arXiv2026-04-28作者：Wenshuo Wang

This position paper argues that knowledge distillation must account for what it loses: student models should be judged not only by retained task scores, but by whether they preserve the teacher capabilities that make those scores reliable. This matters because distillation is increasingly used to turn large teacher models into deployable students, yet headline metrics can obscure losses in the capabilities that make teacher behavior reliable. Conceptually, we show that current evaluation often assumes retained task scores imply retained teacher capabilities. Reframing distillation as a lossy projection exposes this flaw: students may match selected teacher observables without preserving the capabilities that make them reliable. We then synthesize existing evidence into a taxonomy of off-metric distillation losses, showing that such losses are concrete, recurring, and measurable, yet often unaccounted for when studies report what students retain rather than what they lose. To make the position actionable, we propose scenario-specific preservation targets and a Distillation Loss Statement that reports what was preserved, what was lost, and why the remaining losses are acceptable. The

搜索结果：lose

When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction

Knowledge Distillation Must Account for What It Loses

The Complexity of Sparse Win-Lose Bimatrix Games

Reframing AI Loss of Control: What It Is, How to Have It, How to Lose It

Finding a Nash equilibrium of a random win-lose game in expected polynomial time

When Do Consumers Lose from Variable Electricity Pricing?

Do Neural Networks Lose Plasticity in a Gradually Changing World?

Don't Lose Yourself: Boosting Multimodal Recommendation via Reducing Node-neighbor Discrepancy in Graph Convolutional Network

Long bet will lose: demystifying seemingly fair gambling via two-armed Futurity bandit

The Complexity of Computational Problems about Nash Equilibria in Symmetric Win-Lose Games

The minimax property in infinite two-person win-lose games

Some Tractable Win-Lose Games

Effects of Dynamic-Win-Stay-Lose-Learn model with voluntary participation in social dilemma

PCA score regression: the art of losing power

Beyond Accuracy: Characterizing Code Comprehension Capabilities in (Large) Language Models

Coverage-Guided Multi-Agent Harness Generation for Java Library Fuzzing

Code-Centric Detection of Vulnerability-Fixing Commits: A Unified Benchmark and Empirical Study

Preplay Losing Contracts: Inducing Strong Nash Equilibrium in the $n$-player Prisoner's Dilemma

WAMI: Compilation to WebAssembly through MLIR without Losing Abstraction

$δ$-Badly approximable numbers and ubiquitously losing sets