Despite surpassing human performance across mathematics, coding, and other knowledge-intensive tasks, large language models (LLMs) continue to struggle with causal reasoning. A core obstacle is the target data itself: causal systems are complex and often expressed in non-executable forms, while ground-truth answers to causal queries are inherently scarce. We introduce CauSim, a framework that turns causal reasoning from a scarce-label problem into a scalable supervised one. CauSim constructs increasingly complex causal simulators: executable structural causal models (SCMs), incrementally built by LLMs, that scale to globally complex systems while maintaining verifiable answers to causal queries. CauSim operates across representations by formalizing non-executable causal knowledge into code, enabling data augmentation, and translating executable SCMs into natural language, enabling supervision in previously difficult-to-supervise representations. We structure our research into two parts: (1) how to construct increasingly complex causal simulators, and (2) a systematic study of what CauSim enables, demonstrating generalization across representations, consistent gains from curriculum
Cellular networks serve as the backbone of global communication, providing critical access to telephony and the Internet, often in regions lacking alternatives. However, the growing complexity of these networks, driven by architectural innovations (e.g., Voice over IP, eSIMs) and commercial dynamics (e.g., roaming, virtual operators, zero-rating), remains poorly understood due to the lack of open, scalable, and geographically diverse measurement tools and independent measurement studies. Moreover, access to mobile networks today is no longer limited to the traditional radio interface. Technologies like Voice-over-WiFi (VoWiFi) offer alternative connectivity paths via third-party Internet infrastructure, extending operator reach into environments with limited cellular coverage. At the same time, over-the-top (OTT) messaging services such as WhatsApp and Signal have become central to modern communication, accounting for a substantial share of global messaging and voice traffic while bypassing traditional operator-controlled channels entirely. This dissertation addresses these challenges by introducing new approaches for independent, scalable, and reproducible measurements of mobile c
With the rapid advancements in Artificial Intelligence (AI), autonomous agents are increasingly expected to manage complex situations where learning-enabled algorithms are vital. However, the integration of these advanced algorithms poses significant challenges, especially concerning safety and reliability. This research emphasizes the importance of incorporating human-machine collaboration into the systems engineering process to design learning-enabled increasingly autonomous systems (LEIAS). Our proposed LEIAS architecture emphasizes communication representation and pilot preference learning to boost operational safety. Leveraging the Soar cognitive architecture, the system merges symbolic decision logic with numeric decision preferences enhanced through reinforcement learning. A core aspect of this approach is transparency; the LEIAS provides pilots with a comprehensive, interpretable view of the system's state, encompassing detailed evaluations of sensor reliability, including GPS, IMU, and LIDAR data. This multi-sensor assessment is critical for diagnosing discrepancies and maintaining trust. Additionally, the system learns and adapts to pilot preferences, enabling responsive,
Efficient maintenance has always been essential for the successful application of engineering systems. However, the challenges to be overcome in the implementation of Industry 4.0 necessitate new paradigms of maintenance optimization. Machine learning techniques are becoming increasingly used in engineering and maintenance, with reinforcement learning being one of the most promising. In this paper, we propose a gamma degradation process together with a novel maintenance model in which repairs are increasingly imperfect, i.e., the beneficial effect of system repairs decreases as more repairs are performed, reflecting the degradational behavior of real-world systems. To generate maintenance policies for this system, we developed a reinforcement-learning-based agent using a Double Deep Q-Network architecture. This agent presents two important advantages: it works without a predefined preventive threshold, and it can operate in a continuous degradation state space. Our agent learns to behave in different scenarios, showing great flexibility. In addition, we performed an analysis of how changes in the main parameters of the environment affect the maintenance policy proposed by the agent
Bringing increasingly complex polyatomic molecules within reach of precision measurement experiments offers fascinating and far-reaching prospects ranging from Earth sciences and astrophysics, to metrology and quantum sciences. Here, we demonstrate sub-Doppler spectroscopic measurements in the mid-IR fingerprint region of, to our knowledge, the largest molecule to date. To this end, we use a high-resolution ~10.3 $μ$m spectrometer based on a sub-Hz quantum cascade laser remotely calibrated against state-of-the-art primary frequency standards via a metrology-grade fibre link. We perform saturated absorption spectroscopy in the v5 CO stretching mode of 1,3,5-trioxane, (H2CO)3, at a resolution of ~100 kHz, allowing us to measure the absolute frequency of hundreds of rovibrational transitions at unprecedented uncertainties for such a complex species, as low as ~5 kHz. Our work demonstrates the extension of frequency metrology methodologies to ever larger molecular system, confirming the potential of the technologies we develop for bringing increasingly complex species within reach of ultra-precise measurement experiments.
What shapes a consequential decision when human and artificial intelligence work on it together? The answer is becoming harder to see. A decision may look human-led after AI has set the frame, or appear automated while human judgment still carries decisive force. This paper offers a leadership-facing spectrum to see those relationships within a bounded mandate: Pure Human, Centaur (human-dominant, with AI in the loop), Co-equal, Minotaur (AI-dominant, with humans in the loop), and Pure AI. The spectrum asks where leadership work occurs: who frames the problem, who redirects the work, and who can answer for what follows. The five positions are landmarks that help leaders recognize configurations as they layer, drift, or change in a single decision. The central risk is misrecognition: leaders may keep a human-centered story in place after decision-shaping authority has shifted elsewhere. They may believe oversight remains meaningful when it has become ceremonial, or keep humans in the loop when their involvement could make the decision worse. The framework introduces co-adaptability, the capacity of a configuration to improve as human and non-human participants adjust together, and p
Explanations of polarization often rely on one of the three mechanisms: homophily, bounded confidence, and community-based interactions. Models based on these mechanisms consider the lack of interactions as the main cause of polarization. Given the increasing connectivity in modern society, this explanation of polarization may be insufficient. We aim to show that in involvement-based models, society becomes more polarized as its connectedness increases. To this end, we propose a minimal voter-type model (called I-voter) that incorporates involvement as a key mechanism in opinion formation and study its dependence on network connectivity. We describe the steady-state behaviour of the model analytically, at the mean-field and the moment-hierarchy levels and stress the generality of our findings by considering various extensions and different network topologies.
Baseball is a game of strategic decisions including bullpen usage, pinch-hitting and intentional walks. Managers must adjust their strategies based on the changing state of the game in order to give their team the best chance of winning. In this thesis, we investigate how matchup models -- tools that predict the probabilities of plate appearance outcomes -- impact in-game strategy and ultimately affect win probability. We develop four progressively complex, hierarchical Bayesian models that predict plate appearance outcomes by combining information from both pitchers and batters, their handedness, and recent data, along with base running probabilities calibrated to a player's base-stealing tendencies. Using each model within a game-theoretic framework, we approximate subgame perfect Nash equilibria for in-game decisions, including substitutions and intentional walks. Simulations of the 2024 MLB postseason show that more accurate matchup models can yield tangible gains in win probability -- as much as one additional victory per 162-game season. Furthermore, employing the most detailed model to generate win predictions for actual playoff games demonstrates alignment with market expec
In this paper, we investigate the eigenvalues of the Laplacian matrix of the "graph of graphs", in which cubic graphs of order n are joined together using Whitehead moves. Our work follows recent results from arXiv:2303.13923 , which discovered a significant "bottleneck" in the graph of graphs. We found that their bottleneck implies an eigenvalue of order at most O(1). In fact, our main contribution is to expand upon this result by showing that the graph of graphs has increasingly many bounded eigenvalues as n increases to infinity. We also show that these eigenvalues are unusually small, in the sense that they are much smaller than the eigenvalues of a random regular graph with an equal number of vertices and a similar degree.
We study opinion evolution in networks of stubborn agents discussing a sequence of issues, modeled through the so called concatenated Friedkin-Johnsen (FJ) model. It is concatenated in the sense that agents' opinions evolve for each issue, and the final opinion is then taken as a starting point for the next issue. We consider the scenario where agents {also take a vote at the end of each issue} and propose a feedback mechanism from the result (based on the median voter) to the agents' stubbornness. Specifically, agents become increasingly stubborn during issue $s+1$ the more they disagree with the vote at the end of issue $s$. We analyze {this model} for a number of special cases and provide sufficient conditions for convergence to consensus stated in terms of permissible initial opinion and stubbornness. In the opposite scenario, where agents become less stubborn when disagreeing with the vote result, we prove that consensus is achieved{, and we demonstrate the faster convergence of opinions compared to constant stubbornness.
We study the problem of efficiently scheduling a computational DAG on multiple processors. The majority of previous works have developed and compared algorithms for this problem in relatively simple models; in contrast to this, we analyze this problem in a more realistic model that captures many real-world aspects, such as communication costs, synchronization costs, and the hierarchical structure of modern processing architectures. For this we extend the well-established BSP model of parallel computing with non-uniform memory access (NUMA) effects. We then develop a range of new scheduling algorithms to minimize the scheduling cost in this more complex setting: several initialization heuristics, a hill-climbing local search method, and several approaches that formulate (and solve) the scheduling problem as an Integer Linear Program (ILP). We combine these algorithms into a single framework, and conduct experiments on a diverse set of real-world computational DAGs to show that the resulting scheduler significantly outperforms both academic and practical baselines. In particular, even without NUMA effects, our scheduler finds solutions of 24%-44% smaller cost on average than the base
Public release of the weights of pretrained foundation models, otherwise known as downloadable access \citep{solaiman_gradient_2023}, enables fine-tuning without the prohibitive expense of pretraining. Our work argues that increasingly accessible fine-tuning of downloadable models may increase hazards. First, we highlight research to improve the accessibility of fine-tuning. We split our discussion into research that A) reduces the computational cost of fine-tuning and B) improves the ability to share that cost across more actors. Second, we argue that increasingly accessible fine-tuning methods may increase hazard through facilitating malicious use and making oversight of models with potentially dangerous capabilities more difficult. Third, we discuss potential mitigatory measures, as well as benefits of more accessible fine-tuning. Given substantial remaining uncertainty about hazards, we conclude by emphasizing the urgent need for the development of mitigations.
We study the convergence of the Hermite series of measurable functions on the real line. We characterize the norm convergence of truncated partial Hermite sums in rearrangement invariant spaces provided that the truncations vanish sufficiently slowly. Moreover, we provide the necessary and sufficient conditions for convergence in the Orlicz modular.
We consider adaptive increasingly rare Markov chain Monte Carlo (MCMC) algorithms, which are adaptive MCMC methods, where the adaptation concerning the "past'' happens less and less frequently over time. Under a contraction assumption with respect to a Wasserstein-like function we deduce upper bounds of the convergence rate of Monte Carlo sums taking a renormalisation factor into account that is "almost'' the one that appears in a law of the iterated logarithm. We demonstrate the applicability of our results by considering different settings, among which are those of simultaneous geometric and uniform ergodicity. All proofs are carried out on an augmented state space, including the classical non-augmented setting as a special case. In contrast to other adaptive MCMC limit theory, some technical assumptions, like diminishing adaptation, are not needed.
Model collapse, a phenomenon characterized by performance degradation due to iterative training on synthetic data, has been widely studied. However, its implications for bias amplification, the progressive intensification of pre-existing societal biases in Large Language Models (LLMs), remain significantly underexplored, despite the growing influence of LLMs in shaping online discourse. In this paper, we introduce a open, generational, and long-context benchmark specifically designed to measure political bias amplification in LLMs, leveraging sentence continuation tasks derived from a comprehensive dataset of U.S. political news. Our empirical study using GPT-2 reveals consistent and substantial political bias intensification (e.g., right-leaning amplification) over iterative synthetic training cycles. We evaluate three mitigation strategies, Overfitting, Preservation, and Accumulation, and demonstrate that bias amplification persists independently of model collapse, even when the latter is effectively controlled. Furthermore, we propose a mechanistic analysis approach that identifies neurons correlated with specific phenomena during inference through regression and statistical tes
The recent discovery of examples of intermediate-mass helium stars have offered new insights into interacting binaries. These observations will allow significant improvements in our understanding of helium stars. However, in the creation of these stars their companions may accrete a significant amount of helium-rich stellar material. These creates stars with unusual composition profiles -- stars with helium-rich cores, hydrogen-rich lower envelopes and a helium-rich outer envelope. Thus the mean molecular weight reaches a minimum in the the middle of the star rather than continuously decreasing outwards in mass. To demonstrate this structure we present Cambridge STARS model calculations of an example interacting binary systems where the helium-rich material is transferred, and compare it to one where the composition of the accreted mass is fixed to the companion's surface composition. We show that the helium-rich material leads to the accretor being 0.2 dex hotter and 0.15 dex more luminous than models where the composition is not helium rich. We use a simple BPASS v2.2 population model to estimate that helium-rich mass transfer occurs in 23 per cent of massive binaries that underg
Automatic radiology report generation is challenging as medical images or reports are usually similar to each other due to the common content of anatomy. This makes a model hard to capture the uniqueness of individual images and is prone to producing undesired generic or mismatched reports. This situation calls for learning more discriminative features that could capture even fine-grained mismatches between images and reports. To achieve this, this paper proposes a novel framework to learn discriminative image and report features by distinguishing them from their closest peers, i.e., hard negatives. Especially, to attain more discriminative features, we gradually raise the difficulty of such a learning task by creating increasingly hard negative reports for each image in the feature space during training, respectively. By treating the increasingly hard negatives as auxiliary variables, we formulate this process as a min-max alternating optimisation problem. At each iteration, conditioned on a given set of hard negative reports, image and report features are learned as usual by minimising the loss functions related to report generation. After that, a new set of harder negative repor
We construct an efficient class of increasingly high-order (up to 17th-order) essentially non-oscillatory schemes with multi-resolution (ENO-MR) for solving hyperbolic conservation laws. The candidate stencils for constructing ENO-MR schemes range from first-order one-point stencil increasingly up to the designed very high-order stencil. The proposed ENO-MR schemes adopt a very simple and efficient strategy that only requires the computation of the highest-order derivatives of a part of candidate stencils. Besides simplicity and high efficiency, ENO-MR schemes are completely parameter-free and essentially scale-invariant. Theoretical analysis and numerical computations show that ENO-MR schemes achieve designed high-order convergence in smooth regions which may contain high-order critical points (local extrema) and retain ENO property for strong shocks. In addition, ENO-MR schemes could capture complex flow structures very well.
Research in Fairness, Accountability, Transparency, and Ethics (FATE) has established many sources and forms of algorithmic harm, in domains as diverse as health care, finance, policing, and recommendations. Much work remains to be done to mitigate the serious harms of these systems, particularly those disproportionately affecting marginalized communities. Despite these ongoing harms, new systems are being developed and deployed which threaten the perpetuation of the same harms and the creation of novel ones. In response, the FATE community has emphasized the importance of anticipating harms. Our work focuses on the anticipation of harms from increasingly agentic systems. Rather than providing a definition of agency as a binary property, we identify 4 key characteristics which, particularly in combination, tend to increase the agency of a given algorithmic system: underspecification, directness of impact, goal-directedness, and long-term planning. We also discuss important harms which arise from increasing agency -- notably, these include systemic and/or long-range impacts, often on marginalized stakeholders. We emphasize that recognizing agency of algorithmic systems does not abso
Increasing the adoption of alternative technologies is vital to ensure a successful transition to net-zero emissions in the manufacturing sector. Yet there is no model to analyse technology adoption and the impact of policy interventions in generating sufficient demand to reduce cost. Such a model is vital for assessing policy-instruments for the implementation of future energy scenarios. The design of successful policies for technology uptake becomes increasingly difficult when associated market forces/factors are uncertain, such as energy prices or technology efficiencies. In this paper we formulate a novel robust market potential assessment problem under uncertainty, resulting in policies that are immune to uncertain factors. We demonstrate two case studies: the potential use of carbon capture and storage for iron and steel production across the EU, and the transition to hydrogen from natural gas in steam boilers across the chemicals industry in the UK. Each robust optimisation problem is solved using an iterative cutting planes algorithm which enables existing models to be solved under uncertainty. By taking advantage of parallelisation we are able to solve the nonlinear robust