Large language models can follow complex instructions in a single turn, yet over long multi-turn interactions they often lose the thread of instructions, persona, and rules. This degradation has been measured behaviorally but not mechanistically explained. We propose a channel-transition account: goal-defining tokens become less accessible through attention, while goal-related information may persist in residual representations. We introduce the Goal Accessibility Ratio (GAR), measuring attention from generated tokens to task-defining goal tokens, and combine it with sliding-window ablations and residual-stream probes. When attention to instructions closes, what survives reveals architecture. Across architectures, the transition yields qualitatively distinct failure modes: some models preserve goal-conditioned behavior at vanishing attention, others fail despite decodable residual goal information, and the layer at which this encoding emerges varies from 2 to 27. A within-model causal ablation that force-closes the attention channel in Mistral collapses recall from near-perfect to 11% on a 20-fact retention task and raises persona-constraint violations above an adversarial-pressure
This position paper argues that knowledge distillation must account for what it loses: student models should be judged not only by retained task scores, but by whether they preserve the teacher capabilities that make those scores reliable. This matters because distillation is increasingly used to turn large teacher models into deployable students, yet headline metrics can obscure losses in the capabilities that make teacher behavior reliable. Conceptually, we show that current evaluation often assumes retained task scores imply retained teacher capabilities. Reframing distillation as a lossy projection exposes this flaw: students may match selected teacher observables without preserving the capabilities that make them reliable. We then synthesize existing evidence into a taxonomy of off-metric distillation losses, showing that such losses are concrete, recurring, and measurable, yet often unaccounted for when studies report what students retain rather than what they lose. To make the position actionable, we propose scenario-specific preservation targets and a Distillation Loss Statement that reports what was preserved, what was lost, and why the remaining losses are acceptable. The
Repairing Reed-Solomon codes with low bandwidth is a central challenge in distributed storage. Following the trace-repair framework of Guruswami and Wootters (2017), recent works by Lin (2023) and Liu-Wan-Xing (2024) provided significant improvements in bandwidth using two distinct ideas. Lin constructed a trace-repair scheme that requires no contribution from a set of predetermined nodes $\mathscr{S}$, while Liu-Wan-Xing identified linear dependencies among the downloaded traces, relating the number of dependent traces to the dimension of a subspace $\mathscr{W}_k$. In this work, we fully utilize and unify these ideas. We compute the exact dimension of $\mathscr{W}_{k,\mathscr{S}}$ (a generalization of $\mathscr{W}_k$). We identify the trade-off between the set size $|\mathscr{S}|$ and the dimension $\dim(\mathscr{W}_{k,\mathscr{S}})$. We provide an algorithm to find the combination that results in the lowest bandwidth. Furthermore, we provide an explicit choice of the helper nodes for the repair. Finally, we prove that our optimized scheme never loses to the classical repair scheme, establishing a bandwidth guarantee of at most $k\log|\mathbb{F}|$ bits for all dimension $k$ and f
An objective of network neutrality is that the design of regulations for the Internet will ensure that it remains a public, open platform where innovations can thrive. While there is broad agreement that preserving the content quality of service falls under the purview of net neutrality, the role of differential pricing, especially the practice of \emph {zero-rating} remains controversial. Even though some countries (India, Canada) have banned zero-rating, others have either taken no stance or explicitly allowed it (South Africa, Kenya, U.S.). In this paper, we model zero-rating options available between Internet service providers (ISPs) and content providers (CPs) and use these models to better understand the conditions under which offering zero-rated services are preferred, and who specifically gains in utility. We develop a formulation in which providers' incomes vary, from low-income startups to high-income incumbents, and where their decisions to zero-rate are a variation of the traditional prisoner's dilemma game. We find that if zero-rating is permitted, low-income CPs often lose utility, whereas high-income CPs often gain utility. We also study the competitiveness of the CP
Adversarial algorithms have shown to be effective against neural networks for a variety of tasks. Some adversarial algorithms perturb all the pixels in the image minimally for the image classification task in image classification. In contrast, some algorithms perturb few pixels strongly. However, very little information is available regarding why these adversarial samples so diverse from each other exist. Recently, Vargas et al. showed that the existence of these adversarial samples might be due to conflicting saliency within the neural network. We test this hypothesis of conflicting saliency by analysing the Saliency Maps (SM) and Gradient-weighted Class Activation Maps (Grad-CAM) of original and few different types of adversarial samples. We also analyse how different adversarial samples distort the attention of the neural network compared to original samples. We show that in the case of Pixel Attack, perturbed pixels either calls the network attention to themselves or divert the attention from them. Simultaneously, the Projected Gradient Descent Attack perturbs pixels so that intermediate layers inside the neural network lose attention for the correct class. We also show that bo
The EU went after Google for the practice of bundling its search engine and browser with Android
The regression of principal component scores (RPCS) on covariates is a widely used analytic approach to detect and test for associations between functional measurements and study participant characteristics. Here we show that: (1) RPCS loses power relative to Function on Scalar Regression (FoSR); (2) the amount of power loss depends on the correlation between the PCs and the true effect; (3) if not corrected for multiplicity, RPCS has inflated $α$-level; and (4) current RPCS methods do not provide valid inference for the true effect. In contrast, we show that Function on Scalar Regression (FoSR) can avoid these problems using a particular combination of modeling tools. We validate these theoretical findings through extensive simulations and illustrate their practical implications using minute-level accelerometry data from the National Health and Nutrition Examination Survey (NHANES).
At present, loss of control risks have gained much prominence in public discussion, particularly in relation to AI, with extensive discourse present among academics, frontier labs, and even governments. However, in the existing literature, the concept seems to rest on surprisingly weak foundations, where even those that discuss loss of control extensively do not first establish what control is and what exactly is being lost. Our paper aims to address these gaps. We establish a working definition of control by anchoring it to the "setting and getting of goals". Then, we discuss various aspects of control, built on foundational concepts from related fields like cybernetics, management control, and control theory. This includes who (or what) can be in control, and the things they require to be in control, such as the ability to set goals, having a functional control loop, having requisite variety, and having sufficient goal alignment. Once a framework for control is established, we then discuss how control can be lost, how AIs can contribute to such loss of control, and offer relevant recommendations for how one can maintain control. One interesting consequence of our work is that hum
We prove that computing an $ε$-approximate Nash equilibrium of a win-lose bimatrix game with constant sparsity is PPAD-hard for inverse-polynomial $ε$. Our result holds for 3-sparse games, which is tight given that 2-sparse win-lose bimatrix games can be solved in polynomial time.
Continual learning has become a trending topic in machine learning. Recent studies have discovered an interesting phenomenon called loss of plasticity, referring to neural networks gradually losing the ability to learn new tasks. However, existing plasticity research largely relies on benchmarks with abrupt task transitions, without examining whether the abruptness itself contributes to the observed plasticity loss. In this paper, we investigate the role of transition abruptness by simulating gradually changing environments through input/output interpolation and task sampling. We perform theoretical and empirical analysis, showing that the severity of plasticity loss is closely tied to the abruptness of task transitions, and can be substantially reduced when the environment changes gradually.
In strategic games such as the prisoner's dilemma, allowing players to make binding offers of utility transfers before play has been shown to alter incentives and potentially support cooperative outcomes. These preplay exchange mechanisms reshape payoffs by transferring utility while being contingent on actions; however, they typically require side payments that can reduce individual benefits relative to joint cooperation. In this paper, we extend the analysis to a finite $n$-player prisoner's dilemma with ordered strategy sets, defined such that any restriction of strategies by any subset of players still yields a prisoner's dilemma. To achieve a robust cooperative outcome that resists group deviations, we introduce a novel class of mechanisms: $\textit{losing contracts}$. Unlike transfer-based preplay mechanisms, losing contracts require players to irrevocably reduce their own utility if they defect, thereby aligning individual incentives with cooperation without inter-player payments. With appropriately chosen loss amounts, losing contracts induce joint cooperation as the unique strong Nash equilibrium in the modified game and in every restricted game within it, ensuring that co
We consider a natural filtration $\boldsymbol{\operatorname{Bad}}(δ) \subset \boldsymbol{\operatorname{Bad}}(δ')$ for $δ\geq δ'>0$ on the set of badly approximable numbers to complement the filtration of the well approximable numbers by the $τ$-well approximable numbers. We show that the set $\boldsymbol{\operatorname{Bad}}(δ)$ is a $(1/3, 18 δ)$-winning set and give a lower bound on its Hausdorff dimension. We introduce the notion of $(α, β)$-$\textit{ubiquitously losing sets}$ to the theory of Schmidt games, give an upper bound on the Hausdorff dimension of an $(α, β)$-ubiquitously losing set that is strictly less than full Hausdorff dimension, show that $\boldsymbol{\operatorname{Bad}}(δ)$ is a $(1/2, 18/δ)$-ubiquitously losing set, and give an upper bound on the Hausdorff dimension of $\boldsymbol{\operatorname{Bad}}(δ)$ that is strictly less than one. Combined with a finite intersection property and a bilipschitz transfer property, we obtain results for finite intersections of translates of $\boldsymbol{\operatorname{Bad}}(δ)$.
A long-standing open problem in algorithmic game theory asks whether or not there is a polynomial time algorithm to compute a Nash equilibrium in a random bimatrix game. We study random win-lose games, where the entries of the $n\times n$ payoff matrices are independent and identically distributed (i.i.d.) Bernoulli random variables with parameter $p=p(n)$. We prove that, for nearly all values of the parameter $p=p(n)$, there is an expected polynomial-time algorithm to find a Nash equilibrium in a random win-lose game. More precisely, if $p\sim cn^{-a}$ for some parameters $a,c\ge 0$, then there is an expected polynomial-time algorithm whenever $a ot\in \{1/2, 1\}$. In addition, if $a = 1/2$ there is an efficient algorithm if either $c \le e^{-52} 2^{-8} $ or $c\ge 0.977$. If $a=1$, then there is an expected polynomial-time algorithm if either $c\le 0.3849$ or $c\ge \log^9 n$.
WebAssembly (Wasm) is a portable bytecode format that serves as a compilation target for high-level languages, enabling their secure and efficient execution across diverse platforms, including web browsers and embedded systems. To improve support for high-level languages without incurring significant code size or performance overheads, Wasm continuously evolves by integrating high-level features such as Garbage Collection and Stack Switching. However, existing compilation approaches either lack reusable design -- requiring redundant implementation efforts for each language -- or lose abstraction by lowering high-level constructs into low-level shared representations like LLVM IR, which hinder the adoption of high-level features. MLIR compiler infrastructure provides the compilation pipeline with multiple levels of abstraction, preserving high-level abstractions throughout the compilation pipeline, yet the current MLIR pipeline relies on the LLVM backend for Wasm code generation, thereby inheriting LLVM's limitations. This paper presents a novel compilation pipeline for Wasm, featuring Wasm dialects explicitly designed to represent high-level Wasm constructs within MLIR. Our approac
Time-varying electricity pricing better reflects the varying cost of electricity compared to flat-rate pricing. Variations between peak and off-peak costs are increasing due to weather variation, renewable intermittency, and increasing electrification of demand. Empirical and theoretical studies suggest that variable pricing can lower electricity supply costs and reduce grid stress. However, the distributional impacts, particularly on low-income consumers, remain understudied. This paper develops a theoretical framework to analyze how consume heterogeneity affects welfare outcomes when electricity markets transition from flat-rate to time-varying pricing, considering realistic assumptions about heterogeneous consumer demand, supply costs, and utility losses from unmet consumption. We derive sufficient conditions for identifying when consumers lose utility from pricing reforms and compare welfare effects across consumer types. Our findings reveal that consumer vulnerability depends on the interaction of consumption timing, demand flexibility capabilities, and price sensitivity levels. Consumers with high peak-period consumption and inflexible demand, characteristics often associated
We present LoCoVQA, a dynamic benchmark generator for evaluating long-context extractive reasoning in vision language models (VLMs). LoCoVQA augments test examples for mathematical reasoning, VQA, and character recognition tasks with increasingly long visual contexts composed of both in-distribution and out-of-distribution distractor images. Across these tasks, a diverse set of VLMs rapidly lose performance as the visual context length grows, often exhibiting a striking logarithmic decay trend. This test assesses how well VLMs can ignore irrelevant information when answering queries -- a task that is quite easy for language models (LMs) in the text domain -- demonstrating that current state-of-the-art VLMs lack this essential capability for many long-context applications.
In this paper we study the classical Schmidt game on two families of sets: one related to frequencies of digits in base-$2$ expansions, and one connected to the set of the badly approximable numbers. Namely, we describe some nontrivial winning and losing parameters $(α, β)$ for these sets.
The rapid expansion of multimedia contents has led to the emergence of multimodal recommendation systems. It has attracted increasing attention in recommendation systems because its full utilization of data from different modalities alleviates the persistent data sparsity problem. As such, multimodal recommendation models can learn personalized information about nodes in terms of visual and textual. To further alleviate the data sparsity problem, some previous works have introduced graph convolutional networks (GCNs) for multimodal recommendation systems, to enhance the semantic representation of users and items by capturing the potential relationships between them. However, adopting GCNs inevitably introduces the over-smoothing problem, which make nodes to be too similar. Unfortunately, incorporating multimodal information will exacerbate this challenge because nodes that are too similar will lose the personalized information learned through multimodal information. To address this problem, we propose a novel model that retains the personalized information of ego nodes during feature aggregation by Reducing Node-neighbor Discrepancy (RedN^nD). Extensive experiments on three public
We explore a version of the minimax theorem for two-person win-lose games with infinitely many pure strategies. In the countable case, we give a combinatorial condition on the game which implies the minimax property. In the general case, we prove that a game satisfies the minimax property along with all its subgames if and only if none of its subgames is isomorphic to the "larger number game." This generalizes a recent theorem of Hanneke, Livni and Moran. We also propose several applications of our results outside of game theory.
No matter how much some gamblers occasionally win, as long as they continue to gamble, sooner or later they will lose more to the casino, which is the so-called long bet will lose. Our results demonstrate the counter-intuitive phenomenon, that gamblers involved in long bets will lose but casinos always advertise their unprofitable circumstances. Here we expose the law of inevitability behind long bet will loss by theoretically and experimentally demystifying the profitable mystery behind casinos under two-armed antique Mills Futurity slot machine. The main results straightforwardly elucidate that all casino projects are seemingly a fair gamble but essentially unfair, i.e., the casino's win rate is greater than 50%. We anticipate our assay to be a starting point for studying the fairness of more sophisticated multi-armed Futurity bandits based on the mathematical tool. In application, a fairness study of the Futurity bandits not only exposes the fraud of casinos for gamblers but also discloses discount marketing, bundled sales, or other induced consumption tactics.