搜索 — ResearchTracker

Large language models can follow complex instructions in a single turn, yet over long multi-turn interactions they often lose the thread of instructions, persona, and rules. This degradation has been measured behaviorally but not mechanistically explained. We propose a channel-transition account: goal-defining tokens become less accessible through attention, while goal-related information may persist in residual representations. We introduce the Goal Accessibility Ratio (GAR), measuring attention from generated tokens to task-defining goal tokens, and combine it with sliding-window ablations and residual-stream probes. When attention to instructions closes, what survives reveals architecture. Across architectures, the transition yields qualitatively distinct failure modes: some models preserve goal-conditioned behavior at vanishing attention, others fail despite decodable residual goal information, and the layer at which this encoding emerges varies from 2 to 27. A within-model causal ablation that force-closes the attention channel in Mistral collapses recall from near-perfect to 11% on a 20-fact retention task and raises persona-constraint violations above an adversarial-pressure

Knowledge Distillation Must Account for What It Loses

arXiv2026-04-28作者：Wenshuo Wang

This position paper argues that knowledge distillation must account for what it loses: student models should be judged not only by retained task scores, but by whether they preserve the teacher capabilities that make those scores reliable. This matters because distillation is increasingly used to turn large teacher models into deployable students, yet headline metrics can obscure losses in the capabilities that make teacher behavior reliable. Conceptually, we show that current evaluation often assumes retained task scores imply retained teacher capabilities. Reframing distillation as a lossy projection exposes this flaw: students may match selected teacher observables without preserving the capabilities that make them reliable. We then synthesize existing evidence into a taxonomy of off-metric distillation losses, showing that such losses are concrete, recurring, and measurable, yet often unaccounted for when studies report what students retain rather than what they lose. To make the position actionable, we propose scenario-specific preservation targets and a Distillation Loss Statement that reports what was preserved, what was lost, and why the remaining losses are acceptable. The

搜索结果：loses

When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction

Knowledge Distillation Must Account for What It Loses

Trace Repair Never Loses to Classical Repair: Exact and Explicit Helper Nodes Selection

Zero-Rating and Net Neutrality: Who Wins, Who Loses?

Deep neural network loses attention to adversarial images

Google loses long-running appeal of record EU fine, will have to cough up $4.7 billion

PCA score regression: the art of losing power

Reframing AI Loss of Control: What It Is, How to Have It, How to Lose It

The Complexity of Sparse Win-Lose Bimatrix Games

Do Neural Networks Lose Plasticity in a Gradually Changing World?

Preplay Losing Contracts: Inducing Strong Nash Equilibrium in the $n$-player Prisoner's Dilemma

$δ$-Badly approximable numbers and ubiquitously losing sets

Finding a Nash equilibrium of a random win-lose game in expected polynomial time

WAMI: Compilation to WebAssembly through MLIR without Losing Abstraction

When Do Consumers Lose from Variable Electricity Pricing?

Losing Visual Needles in Image Haystacks: Vision Language Models are Easily Distracted in Short and Long Contexts

On Nontrivial Winning and Losing Parameters of Schmidt Games

Don't Lose Yourself: Boosting Multimodal Recommendation via Reducing Node-neighbor Discrepancy in Graph Convolutional Network

The minimax property in infinite two-person win-lose games

Long bet will lose: demystifying seemingly fair gambling via two-armed Futurity bandit