This position paper argues that knowledge distillation must account for what it loses: student models should be judged not only by retained task scores, but by whether they preserve the teacher capabilities that make those scores reliable. This matters because distillation is increasingly used to turn large teacher models into deployable students, yet headline metrics can obscure losses in the capabilities that make teacher behavior reliable. Conceptually, we show that current evaluation often assumes retained task scores imply retained teacher capabilities. Reframing distillation as a lossy projection exposes this flaw: students may match selected teacher observables without preserving the capabilities that make them reliable. We then synthesize existing evidence into a taxonomy of off-metric distillation losses, showing that such losses are concrete, recurring, and measurable, yet often unaccounted for when studies report what students retain rather than what they lose. To make the position actionable, we propose scenario-specific preservation targets and a Distillation Loss Statement that reports what was preserved, what was lost, and why the remaining losses are acceptable. The
A set of exposure scores calculated in 2023 has become a central empirical input to the future of work debate. Produced by Eloundou et al. (2023) and referred to here as the GPTs are GPTs scores, they define exposure as the share of occupational tasks a large language model can assist with. This work is a genuine methodological contribution, but as the scores travel from the time and place they were produced, the limitations the authors named do not always travel with them. Two gaps have widened as a result. The first is structural, between what static exposure scores measure and what policy questions actually require. Taking the diffusion of these scores as a case study, we show how their temporal, geographic, and ontological limitations compound in policy-facing analyses, and we survey five families of research responding to these limits: dynamic and benchmark-based measures, ensemble methods, task-framework extensions, worker-centered metrics, and adoption and usage data. The second gap is the one we argue needs more attention: the coordination between researchers and policymakers. The policy-relevant work which ask who is harmed, who benefits, how, and when, continues to refere
What-if analysis is widely used to explore hypothetical scenarios and evaluate alternative pathways to desired results. However, current approaches are fragmented: systems implement what-if capabilities under diverse terminologies with different analytic techniques. Such fragmentation limits expressiveness, impedes flexible composition and reuse of workflows, and hinders tighter integration with AI. We present PRAXA, a compositional grammar of what-if analysis derived from recurring patterns across 141 publications in visual analytics and HCI venues. PRAXA formulates three primitives: (1) data, defining variables under analysis, (2) model, specifying predictive mechanisms, and (3) interaction operations-pairs of user actions and system responses that execute analyses. We encode PRAXA into a declarative specification language, PSL. To evaluate PRAXA, we first show expressiveness by reconstructing representative workflows from prior work as structured compositions, exposing the predominant focus on single-step rather than multi-step reasoning. Second, we demonstrate composability by revealing that capabilities described under distinct terminologies share the same grammatical structur
Human video datasets used for cotraining robot manipulation policies largely consist of curated demonstrations where motions are orchestrated to resemble robot behavior and 3D hand poses are captured with specialized hardware. A more plentiful source of data is everyday Internet video, but it is an open question what factors enable transfer from such videos to robots. We investigate this using a new dataset of 532 human videos with 28 hours of high-quality triangulated hand labels and natural motions. We find that hand pose quality affects transfer, but even with accurate hands, the inherent motion gap hinders transfer unless the vision and policy networks specialize to each embodiment. Our cotraining recipe yields consistent improvements, with an absolute success rate gain of $29.7\%$ in the low-robot-data regime across six manipulation tasks.
In order to design strong paradigms for isolating lexical access and semantics, we need to know what a word is. Surprisingly few linguists and philosophers have a clear model of what a word is, even though words impact basically every aspect of human life. Researchers that regularly publish academic papers about language often rely on outdated, or inaccurate, assumptions about wordhood. This short pedagogical document outlines what the lexicon is most certainly not (though is often mistakenly taken to be), what it might be (based on current good theories), and what some implications for experimental design are.
Entropic Dynamics (ED) provides a framework that allows the reconstruction of the quantum formalism by insisting on ontological and epistemic clarity and adopting entropic methods and information geometry. Our present goal is to extend the ED framework to account for spin. The result is a realist ψ-epistemic model in which the ontology consists of a particle described by a definite position plus a discrete variable that describes Pauli's peculiar two-valuedness. The resulting dynamics of probabilities is, as might be expected, described by the Pauli equation. What may be unexpected is that the generators of transformations -- Hamiltonians and angular momenta including spin, are all granted clear epistemic status. To the old question `what is spinning?' ED provides a crisp answer: nothing is spinning.
The success of Reinforcement Learning from Human Feedback (RLHF) critically depends on the quality of the reward model. However, while this quality is primarily evaluated through accuracy, it remains unclear whether accuracy fully captures what makes a reward model an effective teacher. We address this question from an optimization perspective. First, we prove that regardless of how accurate a reward model is, if it induces low reward variance, then the RLHF objective suffers from a flat landscape. Consequently, even a perfectly accurate reward model can lead to extremely slow optimization, underperforming less accurate models that induce higher reward variance. We additionally show that a reward model that works well for one language model can induce low reward variance, and thus a flat objective landscape, for another. These results establish a fundamental limitation of evaluating reward models solely based on accuracy or independently of the language model they guide. Experiments using models of up to 8B parameters corroborate our theory, demonstrating the interplay between reward variance, accuracy, and reward maximization rate. Overall, our findings highlight that beyond accur
In recent years, generative AI (GenAI), like large language models and text-to-image models, has received significant attention across various domains. However, ensuring the responsible generation of content by these models is crucial for their real-world applicability. This raises an interesting question: What should responsible GenAI generate, and what should it not? To answer the question, this paper investigates the practical responsible requirements of both textual and visual generative models, outlining five key considerations: generating truthful content, avoiding toxic content, refusing harmful instruction, leaking no training data-related content, and ensuring generated content identifiable. Specifically, we review recent advancements and challenges in addressing these requirements. Besides, we discuss and emphasize the importance of responsible GenAI across healthcare, education, finance, and artificial general intelligence domains. Through a unified perspective on both textual and visual generative models, this paper aims to provide insights into practical safety-related issues and further benefit the community in building responsible GenAI.
Today, we have a sufficiently complete picture of what the Wolf--Rayet (WR) stars are. Predictions of stellar evolution theory are in a good agreement with their parameters, estimated from observational data using stellar atmospheres codes; predictions of population synthesis also agree well with number of known WR stars. This article provides an overview of the main historical milestones in the studies of WR stars, showing how we came to this understanding, and what questions are still unanswered.
The series of meetings ``What comes beyond the Standard Models'' started in 1998 with the idea of organizing a workshop where participants would spend most of the time in discussions, confronting different approaches and ideas. The idea was successful and has developed into an annual workshop, which is taking place every year since 1998. Very open-minded and fruitful discussions have become the trademark of our workshops, producing several published works. We discussed a lot of concepts which could help to understand our universe from the level of the second quantized elementary fermion and boson fields up to the level of the born of our universe.
This article begins with an overview, then gives the precise definition of isotropic turbulence, and follows that with the basic conservation equations, in both real space and wavenumber space. These provide the foundations of all theoretical approaches, both fundamental and phenomenological. After that, my intention is to try to highlight the main unresolved issues and give some indication of what progress there has been over decades (in all cases), and what still needs to be done. I should emphasise that I am not trying to provide either a conventional review or even a pedagogical treatment. Instead I am giving concise summaries, supplemented (where I can) by my own observations, which make substantial points that I believe are original, and which have not been made in the literature. To take just one example, it is known by some people that Kolmogorov's 1962 theory is not correctly described as a 'refinement' of his 1941 theory. This was pointed out by Kraichnan in 1974. However, what does not appear to have been recognized is that the 1962 theory is physically invalid, and also that a plausible implementation of it destroys the Kolmogorov (1941) scaling of energy spectra which
We present here a brief discussion, in Bangla (Bengali), on what is entanglement and why it is interesting.
Entanglement, a puzzle since Einstein's time, has become increasingly crucial with the rise of quantum computation. But what exactly is it? Historically , entanglement can be precisely defined, but only negatively. In this article, we explore four interconnected definitions of entangled states.
What constitutes a "physics of firefly swarms"? In response to a Comment in Nature Reviews Physics, I offer a brief scientific perspective.
Observational astronomy is plagued with selection effects that must be taken into account when interpreting data from astronomical surveys. Because of the physical limitations of observing time and instrument sensitivity, datasets are rarely complete. However, determining specifically what is missing from any sample is not always straightforward. For example, there are always more faint objects (such as galaxies) than bright ones in any brightness-limited sample, but faint objects may not be of the same kind as bright ones. Assuming they are can lead to mischaracterizing the population of objects near the boundary of what can be detected. Similarly, starting with nearby objects that can be well observed and assuming that objects much farther away (and sampled from a younger universe) are of the same kind can lead us astray. Demographic models of galaxy populations can be used as inputs to observing system simulations to create ``mock'' catalogues that can be used to characterize and account for multiple, interacting selection effects. The use of simulations for this purpose is common practice in astronomy, and blurs the line between observations and simulations; the observational d
I want to combine two hitherto largely independent research projects, scientific understanding and mechanistic explanations. Understanding is not only achieved by answering why-questions, that is, by providing scientific explanations, but also by answering what-questions, that is, by providing what I call scientific descriptions. Based on this distinction, I develop three forms of understanding: understanding-what, understanding-why, and understanding-how. I argue that understanding-how is a particularly deep form of understanding, because it is based on mechanistic explanations, which answer why something happens in virtue of what it is made of. I apply the three forms of understanding to two case studies: first, to the historical development of thermodynamics and, second, to the differences between the Clausius and the Boltzmann entropy in explaining thermodynamic processes.
Our collective views regarding the question "what is fundamental?" are continually evolving. These ontological shifts in what we regard as fundamental are largely driven by theoretical advances ("what can we calculate?"), and experimental advances ("what can we measure?"). Rarely (in my view) is epistemology the fundamental driver; more commonly epistemology reacts (after a few decades) to what is going on in the theoretical and experimental zeitgeist.
Proceedings for our meeting ``What comes beyond the Standard Models'', which covered a broad series of subjects.
The ability to generate clarification questions i.e., questions that identify useful missing information in a given context, is important in reducing ambiguity. Humans use previous experience with similar contexts to form a global view and compare it to the given context to ascertain what is missing and what is useful in the context. Inspired by this, we propose a model for clarification question generation where we first identify what is missing by taking a difference between the global and the local view and then train a model to identify what is useful and generate a question about it. Our model outperforms several baselines as judged by both automatic metrics and humans.
Quantum computation is a rapidly progressing field today. What are its principles? In what sense is it distinct from conventional computation? What are its advantages and disadvantages? What type of problems can it address? How practical is it to make a quantum computer? I summarise some of the important concepts of quantum computation, in an attempt to answer these questions. A deeper understanding of them would pave the way for future development.