Large language models can follow complex instructions in a single turn, yet over long multi-turn interactions they often lose the thread of instructions, persona, and rules. This degradation has been measured behaviorally but not mechanistically explained. We propose a channel-transition account: goal-defining tokens become less accessible through attention, while goal-related information may persist in residual representations. We introduce the Goal Accessibility Ratio (GAR), measuring attention from generated tokens to task-defining goal tokens, and combine it with sliding-window ablations and residual-stream probes. When attention to instructions closes, what survives reveals architecture. Across architectures, the transition yields qualitatively distinct failure modes: some models preserve goal-conditioned behavior at vanishing attention, others fail despite decodable residual goal information, and the layer at which this encoding emerges varies from 2 to 27. A within-model causal ablation that force-closes the attention channel in Mistral collapses recall from near-perfect to 11% on a 20-fact retention task and raises persona-constraint violations above an adversarial-pressure
This position paper argues that knowledge distillation must account for what it loses: student models should be judged not only by retained task scores, but by whether they preserve the teacher capabilities that make those scores reliable. This matters because distillation is increasingly used to turn large teacher models into deployable students, yet headline metrics can obscure losses in the capabilities that make teacher behavior reliable. Conceptually, we show that current evaluation often assumes retained task scores imply retained teacher capabilities. Reframing distillation as a lossy projection exposes this flaw: students may match selected teacher observables without preserving the capabilities that make them reliable. We then synthesize existing evidence into a taxonomy of off-metric distillation losses, showing that such losses are concrete, recurring, and measurable, yet often unaccounted for when studies report what students retain rather than what they lose. To make the position actionable, we propose scenario-specific preservation targets and a Distillation Loss Statement that reports what was preserved, what was lost, and why the remaining losses are acceptable. The
We prove that computing an $ε$-approximate Nash equilibrium of a win-lose bimatrix game with constant sparsity is PPAD-hard for inverse-polynomial $ε$. Our result holds for 3-sparse games, which is tight given that 2-sparse win-lose bimatrix games can be solved in polynomial time.
At present, loss of control risks have gained much prominence in public discussion, particularly in relation to AI, with extensive discourse present among academics, frontier labs, and even governments. However, in the existing literature, the concept seems to rest on surprisingly weak foundations, where even those that discuss loss of control extensively do not first establish what control is and what exactly is being lost. Our paper aims to address these gaps. We establish a working definition of control by anchoring it to the "setting and getting of goals". Then, we discuss various aspects of control, built on foundational concepts from related fields like cybernetics, management control, and control theory. This includes who (or what) can be in control, and the things they require to be in control, such as the ability to set goals, having a functional control loop, having requisite variety, and having sufficient goal alignment. Once a framework for control is established, we then discuss how control can be lost, how AIs can contribute to such loss of control, and offer relevant recommendations for how one can maintain control. One interesting consequence of our work is that hum
A long-standing open problem in algorithmic game theory asks whether or not there is a polynomial time algorithm to compute a Nash equilibrium in a random bimatrix game. We study random win-lose games, where the entries of the $n\times n$ payoff matrices are independent and identically distributed (i.i.d.) Bernoulli random variables with parameter $p=p(n)$. We prove that, for nearly all values of the parameter $p=p(n)$, there is an expected polynomial-time algorithm to find a Nash equilibrium in a random win-lose game. More precisely, if $p\sim cn^{-a}$ for some parameters $a,c\ge 0$, then there is an expected polynomial-time algorithm whenever $a ot\in \{1/2, 1\}$. In addition, if $a = 1/2$ there is an efficient algorithm if either $c \le e^{-52} 2^{-8} $ or $c\ge 0.977$. If $a=1$, then there is an expected polynomial-time algorithm if either $c\le 0.3849$ or $c\ge \log^9 n$.
Time-varying electricity pricing better reflects the varying cost of electricity compared to flat-rate pricing. Variations between peak and off-peak costs are increasing due to weather variation, renewable intermittency, and increasing electrification of demand. Empirical and theoretical studies suggest that variable pricing can lower electricity supply costs and reduce grid stress. However, the distributional impacts, particularly on low-income consumers, remain understudied. This paper develops a theoretical framework to analyze how consume heterogeneity affects welfare outcomes when electricity markets transition from flat-rate to time-varying pricing, considering realistic assumptions about heterogeneous consumer demand, supply costs, and utility losses from unmet consumption. We derive sufficient conditions for identifying when consumers lose utility from pricing reforms and compare welfare effects across consumer types. Our findings reveal that consumer vulnerability depends on the interaction of consumption timing, demand flexibility capabilities, and price sensitivity levels. Consumers with high peak-period consumption and inflexible demand, characteristics often associated
Continual learning has become a trending topic in machine learning. Recent studies have discovered an interesting phenomenon called loss of plasticity, referring to neural networks gradually losing the ability to learn new tasks. However, existing plasticity research largely relies on contrived settings with abrupt task transitions, which often do not reflect real-world environments. In this paper, we propose to investigate a gradually changing environment, and we simulate this by input/output interpolation and task sampling. We perform theoretical and empirical analysis, showing that the loss of plasticity is an artifact of abrupt tasks changes in the environment and can be largely mitigated if the world changes gradually.
The rapid expansion of multimedia contents has led to the emergence of multimodal recommendation systems. It has attracted increasing attention in recommendation systems because its full utilization of data from different modalities alleviates the persistent data sparsity problem. As such, multimodal recommendation models can learn personalized information about nodes in terms of visual and textual. To further alleviate the data sparsity problem, some previous works have introduced graph convolutional networks (GCNs) for multimodal recommendation systems, to enhance the semantic representation of users and items by capturing the potential relationships between them. However, adopting GCNs inevitably introduces the over-smoothing problem, which make nodes to be too similar. Unfortunately, incorporating multimodal information will exacerbate this challenge because nodes that are too similar will lose the personalized information learned through multimodal information. To address this problem, we propose a novel model that retains the personalized information of ego nodes during feature aggregation by Reducing Node-neighbor Discrepancy (RedN^nD). Extensive experiments on three public
No matter how much some gamblers occasionally win, as long as they continue to gamble, sooner or later they will lose more to the casino, which is the so-called long bet will lose. Our results demonstrate the counter-intuitive phenomenon, that gamblers involved in long bets will lose but casinos always advertise their unprofitable circumstances. Here we expose the law of inevitability behind long bet will loss by theoretically and experimentally demystifying the profitable mystery behind casinos under two-armed antique Mills Futurity slot machine. The main results straightforwardly elucidate that all casino projects are seemingly a fair gamble but essentially unfair, i.e., the casino's win rate is greater than 50%. We anticipate our assay to be a starting point for studying the fairness of more sophisticated multi-armed Futurity bandits based on the mathematical tool. In application, a fairness study of the Futurity bandits not only exposes the fraud of casinos for gamblers but also discloses discount marketing, bundled sales, or other induced consumption tactics.
We revisit the complexity of deciding, given a {\it bimatrix game,} whether it has a {\it Nash equilibrium} with certain natural properties; such decision problems were early known to be ${\mathcal{NP}}$-hard~\cite{GZ89}. We show that ${\mathcal{NP}}$-hardness still holds under two significant restrictions in simultaneity: the game is {\it win-lose} (that is, all {\it utilities} are $0$ or $1$) and {\it symmetric}. To address the former restriction, we design win-lose {\it gadgets} and a win-lose reduction; to accomodate the latter restriction, we employ and analyze the classical {\it ${\mathsf{GHR}}$-symmetrization}~\cite{GHR63} in the win-lose setting. Thus, {\it symmetric win-lose bimatrix games} are as complex as general bimatrix games with respect to such decision problems. As a byproduct of our techniques, we derive hardness results for search, counting and parity problems about Nash equilibria in symmetric win-lose bimatrix games.
We explore a version of the minimax theorem for two-person win-lose games with infinitely many pure strategies. In the countable case, we give a combinatorial condition on the game which implies the minimax property. In the general case, we prove that a game satisfies the minimax property along with all its subgames if and only if none of its subgames is isomorphic to the "larger number game." This generalizes a recent theorem of Hanneke, Livni and Moran. We also propose several applications of our results outside of game theory.
Determining a Nash equilibrium in a $2$-player non-zero sum game is known to be PPAD-hard (Chen and Deng (2006), Chen, Deng and Teng (2009)). The problem, even when restricted to win-lose bimatrix games, remains PPAD-hard (Abbott, Kane and Valiant (2005)). However, there do exist polynomial time tractable classes of win-lose bimatrix games - such as, very sparse games (Codenotti, Leoncini and Resta (2006)) and planar games (Addario-Berry, Olver and Vetta (2007)). We extend the results in the latter work to $K_{3,3}$ minor-free games and a subclass of $K_5$ minor-free games. Both these classes of games strictly contain planar games. Further, we sharpen the upper bound to unambiguous logspace, a small complexity class contained well within polynomial time. Apart from these classes of games, our results also extend to a class of games that contain both $K_{3,3}$ and $K_5$ as minors, thereby covering a large and non-trivial class of win-lose bimatrix games. For this class, we prove an upper bound of nondeterministic logspace, again a small complexity class within polynomial time. Our techniques are primarily graph theoretic and use structural characterizations of the considered minor-c
In recent years, Win-Stay-Lose-Learn rule has attracted wide attention as an effective strategy updating rule, and voluntary participation is proposed by introducing a third strategy in Prisoner's dilemma game. Some researches show that combining Win-Stay-Lose-Learn rule with voluntary participation could promote cooperation more significantly under moderate temptation values, however, cooperators' survival under high aspiration levels and high temptation values is still a challenging problem. In this paper, inspired by Achievement Motivation Theory, a Dynamic-Win-Stay-Lose-Learn rule with voluntary participation is investigated, where a dynamic aspiration process is introduced to describe the co-evolution of individuals' strategies and aspirations. It is found that cooperation is extremely promoted and defection is almost extinct in our model, even when the initial aspiration levels and temptation values are high. The combination of dynamic aspiration and voluntary participation plays an active role since loners could survive under high initial aspiration levels and they will expand stably because of their fixed payoffs. The robustness of our model is also discussed and some adver
The regression of principal component scores (RPCS) on covariates is a widely used analytic approach to detect and test for associations between functional measurements and study participant characteristics. Here we show that: (1) RPCS loses power relative to Function on Scalar Regression (FoSR); (2) the amount of power loss depends on the correlation between the PCs and the true effect; (3) if not corrected for multiplicity, RPCS has inflated $α$-level; and (4) current RPCS methods do not provide valid inference for the true effect. In contrast, we show that Function on Scalar Regression (FoSR) can avoid these problems using a particular combination of modeling tools. We validate these theoretical findings through extensive simulations and illustrate their practical implications using minute-level accelerometry data from the National Health and Nutrition Examination Survey (NHANES).
Large Language Models (LLMs) are increasingly integrated into software engineering workflows, yet current benchmarks provide only coarse performance summaries that obscure the diverse capabilities and limitations of these models. This paper investigates whether LLMs' code-comprehension performance aligns with traditional human-centric software metrics or instead reflects distinct, non-human regularities. We introduce a diagnostic framework that reframes code understanding as a binary input-output consistency task, enabling the evaluation of classification and generative models. Using a large-scale dataset, we correlate model performance with traditional, human-centric complexity metrics, such as lexical size, control-flow complexity, and abstract syntax tree structure. Our analyses reveal minimal correlation between human-defined metrics and LLM success (AUROC 0.63), while shadow models achieve substantially higher predictive performance (AUROC 0.86), capturing complex, partially predictable patterns beyond traditional software measures. These findings suggest that LLM comprehension reflects model-specific regularities only partially accessible through either human-designed or lear
Coverage-guided fuzzing has proven effective for software testing, but targeting library code requires specialized fuzz harnesses that translate fuzzer-generated inputs into valid API invocations. Manual harness creation is time-consuming and requires deep understanding of API semantics, initialization sequences, and exception handling contracts. We present a multi-agent architecture that automates fuzz harness generation for Java libraries through specialized LLM-powered agents. Five ReAct agents decompose the workflow into research, synthesis, compilation repair, coverage analysis, and refinement. Rather than preprocessing entire codebases, agents query documentation, source code, and callgraph information on demand through the Model Context Protocol, maintaining focused context while exploring complex dependencies. To enable effective refinement, we introduce method-targeted coverage that tracks coverage only during target method execution to isolate target behavior, and agent-guided termination that examines uncovered source code to distinguish productive refinement opportunities from diminishing returns. We evaluated our approach on seven target methods from six widely-deploye
Automated detection of vulnerability-fixing commits (VFCs) is critical for timely security patch deployment, as advisory databases lag patch releases by a median of 25 days and many fixes never receive advisories. We present a comprehensive evaluation of code language model based VFC detection through a unified framework consolidating over 20 fragmented datasets spanning more than 180000 commits. Across over 180 experiments with fine-tuned models from 125 M to 14 B parameters, we find no evidence that models acquire transferable security-relevant code understanding from code changes alone. When commit messages are available, they dominate model attention, and when removed, an attribution analysis shows that enriching diffs with additional intra-procedural semantic context does not shift model attention toward the code changes. Group-stratified evaluation exposes approximately 17% performance drops compared to random splits, while temporal splits on aggregated datasets prove unreliable due to compositional shift in the underlying project distributions. At a false positive rate of 0.5% all fine-tuned code-only models miss over 93% of vulnerabilities. Larger and more diverse training
In strategic games such as the prisoner's dilemma, allowing players to make binding offers of utility transfers before play has been shown to alter incentives and potentially support cooperative outcomes. These preplay exchange mechanisms reshape payoffs by transferring utility while being contingent on actions; however, they typically require side payments that can reduce individual benefits relative to joint cooperation. In this paper, we extend the analysis to a finite $n$-player prisoner's dilemma with ordered strategy sets, defined such that any restriction of strategies by any subset of players still yields a prisoner's dilemma. To achieve a robust cooperative outcome that resists group deviations, we introduce a novel class of mechanisms: $\textit{losing contracts}$. Unlike transfer-based preplay mechanisms, losing contracts require players to irrevocably reduce their own utility if they defect, thereby aligning individual incentives with cooperation without inter-player payments. With appropriately chosen loss amounts, losing contracts induce joint cooperation as the unique strong Nash equilibrium in the modified game and in every restricted game within it, ensuring that co
WebAssembly (Wasm) is a portable bytecode format that serves as a compilation target for high-level languages, enabling their secure and efficient execution across diverse platforms, including web browsers and embedded systems. To improve support for high-level languages without incurring significant code size or performance overheads, Wasm continuously evolves by integrating high-level features such as Garbage Collection and Stack Switching. However, existing compilation approaches either lack reusable design -- requiring redundant implementation efforts for each language -- or lose abstraction by lowering high-level constructs into low-level shared representations like LLVM IR, which hinder the adoption of high-level features. MLIR compiler infrastructure provides the compilation pipeline with multiple levels of abstraction, preserving high-level abstractions throughout the compilation pipeline, yet the current MLIR pipeline relies on the LLVM backend for Wasm code generation, thereby inheriting LLVM's limitations. This paper presents a novel compilation pipeline for Wasm, featuring Wasm dialects explicitly designed to represent high-level Wasm constructs within MLIR. Our approac
We consider a natural filtration $\boldsymbol{\operatorname{Bad}}(δ) \subset \boldsymbol{\operatorname{Bad}}(δ')$ for $δ\geq δ'>0$ on the set of badly approximable numbers to complement the filtration of the well approximable numbers by the $τ$-well approximable numbers. We show that the set $\boldsymbol{\operatorname{Bad}}(δ)$ is a $(1/3, 18 δ)$-winning set and give a lower bound on its Hausdorff dimension. We introduce the notion of $(α, β)$-$\textit{ubiquitously losing sets}$ to the theory of Schmidt games, give an upper bound on the Hausdorff dimension of an $(α, β)$-ubiquitously losing set that is strictly less than full Hausdorff dimension, show that $\boldsymbol{\operatorname{Bad}}(δ)$ is a $(1/2, 18/δ)$-ubiquitously losing set, and give an upper bound on the Hausdorff dimension of $\boldsymbol{\operatorname{Bad}}(δ)$ that is strictly less than one. Combined with a finite intersection property and a bilipschitz transfer property, we obtain results for finite intersections of translates of $\boldsymbol{\operatorname{Bad}}(δ)$.