Language models have consistently grown to compress more world knowledge into their parameters, but the knowledge that can be pretrained into them is upper-bounded by their parameter size. Especially the capacity of Small Language Models (SLMs) is limited, leading to factually incorrect generations. This problem is often mitigated by giving the SLM access to an outside source: the ability to query a larger model, documents, or a database. Under this setting, we study the fundamental question of \emph{which tokens an SLM can and should learn} during pretraining, versus \emph{which ones it should delegate} via a \texttt{<CALL>} token. We find that this is not simply a question of loss: although the loss is predictive of whether a predicted token mismatches the ground-truth, some tokens are \emph{acceptable} in that they are truthful alternative continuations of a pretraining document, and should not trigger a \texttt{<CALL>} even if their loss is high. We find that a spaCy grammar parser can help augment the loss signal to decide which tokens the SLM should learn to delegate to prevent factual errors and which are safe to learn and predict even under high losses. We propo
A key task in AI practice is to assess potential impacts to prevent harm. Current AI tools assisting AI impact assessment have not been designed or evaluated for collaborative team brainstorming, and they do not capture the range of views in diverse teams. We studied how AI can support team brainstorming during AI impact assessment and made three contributions. First, we adapted two structured methods from strategic foresight and co-designed AI interventions for them in five in-person workshops with 28 participants in total. Second, we evaluated the interventions in ten in-person workshops with 54 participants, finding that AI improved impact assessment quality and brainstorming perceptions for a general-purpose AI use (a chatbot companion) but not for a specialised one (a kidney allocation application). Third, our findings result in broader design guidance for AI assistance in brainstorming: AI should only offer hints and not solutions during early ideation, initiating interaction only when participants face fixation or saturation; it should facilitate structuring ideas during convergence; leverage expertise to refine ideas; and overall, it should serve more in support of tedious
The widespread use of foundation models has introduced a new risk factor of copyright issue. This issue is leading to an active, lively and on-going debate amongst the data-science community as well as amongst legal scholars. Where claims and results across both sides are often interpreted in different ways and leading to different implications. Our position is that much of the technical literature relies on traditional reconstruction techniques that are not designed for copyright analysis. As a result, memorization and copying have been conflated across both technical and legal communities and in multiple contexts. We argue that memorization, as commonly studied in data science, should not be equated with copying and should not be used as a proxy for copyright infringement. We distinguish technical signals that meaningfully indicate infringement risk from those that instead reflect lawful generalization or high-frequency content. Based on this analysis, we advocate for an output-level, risk-based evaluation process that aligns technical assessments with established copyright standards and provides a more principled foundation for research, auditing, and policy.
Designing wise AI policy is a grand challenge for society. To design such policy, policymakers should place a premium on rigorous evidence and scientific consensus. While several mechanisms exist for evidence generation, and nascent mechanisms tackle evidence synthesis, we identify a complete void on consensus formation. In this position paper, we argue NeurIPS should actively catalyze scientific consensus on AI policy. Beyond identifying the current deficit in consensus formation mechanisms, we argue that NeurIPS is the best option due its strengths and the paucity of compelling alternatives. To make progress, we recommend initial pilots for NeurIPS by distilling lessons from the IPCC's leadership to build scientific consensus on climate policy. We dispel predictable counters that AI researchers disagree too much to achieve consensus and that policy engagement is not the business of NeurIPS. NeurIPS leads AI on many fronts, and it should champion scientific consensus to create higher quality AI policy.
This position paper argues that the next generation of vision encoders should be image size agnostic and task driven. The source of our inspiration is biological. Not a structural aspect of biological vision, but a behavioral trait -- efficiency. We focus on a couple of ways in which vision in nature is efficient, but modern vision encoders not. We -- humans and animals -- deal with vast quantities of visual data, and need to be smart where we focus our limited energy -- it depends on the task. It is our belief that vision encoders should be dynamic and the computational complexity should depend on the task at hand rather than the size of the image. We, also, provide concrete first steps towards our vision -- a proof-of-concept solution for image classification. Despite classification being not very representative for what we are trying to achieve, it shows that our approach is feasible and promising.
While Large Language Models require more and more data to train and scale, rather than looking for any data to acquire, we should consider what types of tasks are more likely to benefit from data scaling. We should be intentional in our data acquisition. We argue that the shape of the data itself, such as its compositional and structural patterns, informs which tasks to prioritize in data scaling, and shapes the development of the next generation of compute paradigms for tasks where data scaling is inefficient, or even insufficient.
Autoregressive language models have demonstrated a remarkable ability to extract latent structure from text. The embeddings from large language models have been shown to capture aspects of the syntax and semantics of language. But what should embeddings represent? We connect the autoregressive prediction objective to the idea of constructing predictive sufficient statistics to summarize the information contained in a sequence of observations, and use this connection to identify three settings where the optimal content of embeddings can be identified: independent identically distributed data, where the embedding should capture the sufficient statistics of the data; latent state models, where the embedding should encode the posterior distribution over states given the data; and discrete hypothesis spaces, where the embedding should reflect the posterior distribution over hypotheses given the data. We then conduct empirical probing studies to show that transformers encode these three kinds of latent generating distributions, and that they perform well in out-of-distribution cases and without token memorization in these settings.
This study investigates who should bear the responsibility of combating the spread of misinformation in social networks. Should that be the online platforms or their users? Should that be done by debunking the "fake news" already in circulation or by investing in preemptive efforts to prevent their diffusion altogether? We seek to answer such questions in a stylized opinion dynamics framework, where agents in a network aggregate the information they receive from peers and/or from influential external sources, with the aim of learning a ground truth among a set of competing hypotheses. In most cases, we find centralized sources to be more effective at combating misinformation than distributed ones, suggesting that online platforms should play an active role in the fight against fake news. In line with literature on the "backfire effect", we find that debunking in certain circumstances can be a counterproductive strategy, whereas some targeted strategies (akin to "deplatforming") and/or preemptive campaigns turn out to be quite effective. Despite its simplicity, our model provides useful guidelines that could inform the ongoing debate on online disinformation and the best ways to lim
The recent DESI results provide increasing evidence that the density of dark energy is time-dependent. I will recall why, from the point of view of fundamental theory,, this result should not be surprising.
This paper argues that fully autonomous AI agents should not be developed. In support of this position, we build from prior scientific literature and current product marketing to delineate different AI agent levels and detail the ethical values at play in each, documenting trade-offs in potential benefits and risks. Our analysis reveals that risks to people increase with the autonomy of a system: The more control a user cedes to an AI agent, the more risks to people arise. Particularly concerning are safety risks, which affect human life and impact further values.
In the Frontier AI Safety Commitments, sixteen companies committed to "Assess the risks posed by their frontier models or systems across the AI lifecycle, including [...] as appropriate, before and during training" (I) and to "Provide public transparency on the implementation of the above (I-VI), except insofar as doing so would increase risk or divulge sensitive commercial information to a degree disproportionate to the societal benefit. They should still share more detailed information which cannot be shared publicly with trusted actors, including their respective home governments or appointed body, as appropriate" (VII). This short paper considers what information should be shared with whom before training begins. What information should be shared publicly and what only with trusted actors such as home governments? Sharing such information before a frontier training run can build shared awareness and preparedness, can improve risk assessment and management, and can contribute to greater predictability and accountability. Companies could share certain information before a training run including: Expected dates of beginning and end of training; Expected compute used (in FLOP); Des
Covariate balancing is a popular technique for controlling confounding in observational studies. It finds weights for the treatment group which are close to uniform, but make the group's covariate means (approximately) equal to those of the entire sample. A crucial question is: how approximate should the balancing be, in order to minimize the error of the final estimate? Current guidance is derived from heuristic or asymptotic analyses, which are uninformative when the size of the sample is small compared to the number of covariates. This paper presents the first rigorous, nonasymptotic analysis of covariate balancing; specifically, we use PAC-Bayesian techniques to derive valid, finite-sample confidence intervals for the treatment effect. More generally, we prove these guarantees for a flexible form of covariate balancing where the regularization parameters weighting the tradeoff between bias (imbalance) and variance (divergence from uniform) are optimized, not fixed. This gives rise to a new balancing algorithm which empirically delivers superior adaptivity. Our overall contribution is to make covariate balancing a more reliable method for causal inference.
Partially observable Markov Decision Processes (POMDPs) are a standard model for agents making decisions in uncertain environments. Most work on POMDPs focuses on synthesizing strategies based on the available capabilities. However, system designers can often control an agent's observation capabilities, e.g. by placing or selecting sensors. This raises the question of how one should select an agent's sensors cost-effectively such that it achieves the desired goals. In this paper, we study the novel optimal observability problem OOP: Given a POMDP M, how should one change M's observation capabilities within a fixed budget such that its (minimal) expected reward remains below a given threshold? We show that the problem is undecidable in general and decidable when considering positional strategies only. We present two algorithms for a decidable fragment of the OOP: one based on optimal strategies of M's underlying Markov decision process and one based on parameter synthesis with SMT. We report promising results for variants of typical examples from the POMDP literature.
I argue for an approach to the Foundations of Physics that puts the question in the title center stage, rather than asking "what is the case in the world?". This approach, algorithmic idealism, attempts to give a mathematically rigorous in-principle-answer to this question both in the usual empirical regime of physics and in some more exotic regimes within cosmology, philosophy, and science fiction (but soon perhaps real) technology. I begin by arguing that quantum theory, in its actual practice and in some interpretations, should be understood as telling an agent what they should expect to observe next (rather than what is the case), and that the difficulty of answering this former question from the usual "external" perspective is at the heart of persistent enigmas such as the Boltzmann brain problem, extended Wigner's friend scenarios, Parfit's teletransportation paradox, or our understanding of the simulation hypothesis. Algorithmic idealism is a conceptual framework, based on two postulates that admit several possible mathematical formalizations, cast in the language of algorithmic information theory. Here I give a non-technical description of this view and show how it dissolve
Dramatic advances in artificial intelligence over the past decade (for narrow-purpose AI) and the last several years (for general-purpose AI) have transformed AI from a niche academic field to the core business strategy of many of the world's largest companies, with hundreds of billions of dollars in annual investment in the techniques and technologies for advancing AI's capabilities. We now come to a critical juncture. As the capabilities of new AI systems begin to match and exceed those of humans across many cognitive domains, humanity must decide: how far do we go, and in what direction? This essay argues that we should keep the future human by closing the "gates" to smarter-than-human, autonomous, general-purpose AI -- sometimes called "AGI" -- and especially to the highly-superhuman version sometimes called "superintelligence." Instead, we should focus on powerful, trustworthy AI tools that can empower individuals and transformatively improve human societies' abilities to do what they do best.
Let's transform our robot secretaries into Socratic gadflies.
Who should own the Artificial Intelligence technology? It should belong to everyone, properly said not the technology per se, but the fruits that can be reaped from it. Obviously, we should not let AI end up in the hands of irresponsible persons. Likewise, nuclear technology should benefit all, however it should be kept secret and inaccessible by the public at large.
Large language models (LLMs) have significantly improved the ability to perform tasks in the field of code generation. However, there is still a gap between LLMs being capable coders and being top-tier software engineers. Based on the observation that toplevel software engineers often ask clarifying questions to reduce ambiguity in both requirements and coding solutions, I argue that the same should be applied to LLMs for code generation tasks. By asking probing questions in various topics before generating the final code, the challenges of programming with LLMs, such as unclear intent specification, lack of computational thinking, and undesired code quality, may be alleviated. This, in turn, increases confidence in the generated code. In this work, I explore how to leverage better communication skills to achieve greater confidence in generated code. I propose a communication-centered process that uses an LLM-generated communicator to identify issues with high ambiguity or low confidence in problem descriptions and generated code. I then ask clarifying questions to obtain responses from users for refining the code.
There are two strong arguments in favor of vector-like leptons and quarks: Flavor Democracy call for them, and E6 GUT predicts existence of iso-singlet quarks and iso-doublet leptons. Vector-like quarks (VLQ) are extensively searched by ATLAS and CMS collaborations, but this is not the case for vector-like leptons (VLL), while they have actually similar status from phenomenology viewpoint. In this study we argue that vector-like leptons should be included into the new physics search programs of energy-frontier colliders. We consider production of vector-like partners of the first SM family leptons at the HL-LHC, HE-LHC, FCC, ILC, CLIC, Muon Collider, as well as, at ep and μ-p colliders. As for decays of vector-like leptons, we present branching ratios formulas to different channels for the most general case. Since there are many different production and decay channels for charged and neutral vector-like leptons, relevant studies should be done systematically. We invite the High Energy Physics community (both experimenters and phenomenologists) to actively participate in research on this topic.
Our community believes that new domain-specific languages should be as general as possible to increase their impact. However, I argue in this essay that we should stop claiming generality for new domain-specific languages. More general domain-specific languages induce more boilerplate code. Moreover, domain-specific languages are co-developed with their applications in practice, and tend to be specific for these applications. Thus, I argue we should stop claiming generality in favor of documenting how domain-specific language based software development is beneficial to the overall software development process. The acceptance criteria for scientific literature should make the same shift: accepting good domain-specific language engineering practice, instead of the next language to rule them all.