Unstructured model editing aims to update models with real-world text, yet existing methods often memorize text holistically without reliable fine-grained fact access. To address this, we propose FABLE, a hierarchical framework that decouples fine-grained fact injection from holistic text generation. FABLE follows a two-stage, fact-first strategy: discrete facts are anchored in shallow layers, followed by minimal updates to deeper layers to produce coherent text. This decoupling resolves the mismatch between holistic recall and fine-grained fact access, reflecting the unidirectional Transformer flow in which surface-form generation amplifies rather than corrects underlying fact representations. We also introduce UnFine, a diagnostic benchmark with fine-grained question-answer pairs and fact-level metrics for systematic evaluation. Experiments show that FABLE substantially improves fine-grained question answering while maintaining state-of-the-art holistic editing performance. Our code is publicly available at https://github.com/caskcsg/FABLE.
Deep learning-based weather forecasting (DLWF) models have recently demonstrated significant performance gains over gold-standard physics-based simulation tools. However, these models are potentially vulnerable to adversarial attacks, which raises concerns about their trustworthiness. In this paper, we investigate the feasibility and challenges of applying existing adversarial attack methods to DLWF models and propose a novel framework called FABLE (Forecast Alteration By Localized targeted advErsarial attack) to address them. FABLE performs a 3D discrete wavelet decomposition to disentangle the spatial and temporal components of the data. By regulating the magnitude of adversarial perturbations across different components, FABLE produces adversarial inputs that remain closely aligned with the original inputs while steering the DLWF models toward generating the targeted forecast outcomes. Experimental results on real-world weather datasets demonstrate the effectiveness of FABLE over baseline methods across various metrics.
The Fast Approximate BLock-Encoding algorithm (FABLE) is a technique to block-encode arbitrary $N\times N$ dense matrices into quantum circuits using at most $O(N^2)$ one and two-qubit gates and $\mathcal{O}(N^2\log{N})$ classical operations. The method nontrivially transforms a matrix $A$ into a collection of angles to be implemented in a sequence of $y$-rotation gates within the block-encoding circuit. If an angle falls below a threshold value, its corresponding rotation gate may be eliminated without significantly impacting the accuracy of the encoding. Ideally many of these rotation gates may be eliminated at little cost to the accuracy of the block-encoding such that quantum resources are minimized. In this paper we describe two modifications of FABLE to efficiently encode sparse matrices; in the first method termed Sparse-FABLE (S-FABLE), for a generic unstructured sparse matrix $A$ we use FABLE to block encode the Hadamard-conjugated matrix $H^{\otimes n}AH^{\otimes n}$ (computed with $\mathcal{O}(N^2\log N)$ classical operations) and conjugate the resulting circuit with $n$ extra Hadamard gates on each side to reclaim a block-approximation to $A$. We demonstrate that the FA
Moral stories are a time-tested vehicle for transmitting values, yet modern NLP lacks a large, structured corpus that couples coherent narratives with explicit ethical lessons. We present TF1-EN-3M, to our knowledge the first open dataset of three million English-language fables generated exclusively by instruction-tuned models no larger than 8B parameters. Each story follows a six-slot scaffold (character -> trait -> setting -> conflict -> resolution -> moral), produced through a combinatorial prompt engine that guarantees genre fidelity while covering a broad thematic space. A fully reproducible evaluation pipeline employs a panel of open-weight LLM judges from distinct model families, scoring grammar, creativity, moral clarity, and template adherence, complemented by reference-free diversity and readability metrics. Among ten open-weight generator candidates, an 8B-parameter Llama-3 variant delivers the best quality-cost trade-off, producing high-scoring fables on consumer hardware at approximately $0.135 per 1,000 fables. We release the dataset, generation code, evaluation scripts, and full metadata under a permissive license, enabling exact reproducibility and c
Understanding how data moves, transforms, and persists, known as data flow, is fundamental to reasoning in procedural tasks. Despite their fluency in natural and programming languages, large language models (LLMs), although increasingly being applied to decisions with procedural tasks, have not been systematically evaluated for their ability to perform data-flow reasoning. We introduce FABLE, an extensible benchmark designed to assess LLMs' understanding of data flow using structured, procedural text. FABLE adapts eight classical data-flow analyses from software engineering: reaching definitions, very busy expressions, available expressions, live variable analysis, interval analysis, type-state analysis, taint analysis, and concurrency analysis. These analyses are instantiated across three real-world domains: cooking recipes, travel routes, and automated plans. The benchmark includes 2,400 question-answer pairs, with 100 examples for each domain-analysis combination. We evaluate three types of LLMs: a reasoning-focused model (DeepSeek-R1 8B), a general-purpose model (LLaMA 3.1 8B), and a code-specific model (Granite Code 8B). Each model is tested using majority voting over five sam
As LLMs excel on standard reading comprehension benchmarks, attention is shifting toward evaluating their capacity for complex abstract reasoning and inference. Literature-based benchmarks, with their rich narrative and moral depth, provide a compelling framework for evaluating such deeper comprehension skills. Here, we present MORABLES, a human-verified benchmark built from fables and short stories drawn from historical literature. The main task is structured as multiple-choice questions targeting moral inference, with carefully crafted distractors that challenge models to go beyond shallow, extractive question answering. To further stress-test model robustness, we introduce adversarial variants designed to surface LLM vulnerabilities and shortcuts due to issues such as data contamination. Our findings show that, while larger models outperform smaller ones, they remain susceptible to adversarial manipulation and often rely on superficial patterns rather than true moral reasoning. This brittleness results in significant self-contradiction, with the best models refuting their own answers in roughly 20% of cases depending on the framing of the moral choice. Interestingly, reasoning-e
Understanding the impact of baryonic physics on cosmic structure formation is crucial for accurate cosmological predictions, especially as we usher in the era of large galaxy surveys with the Rubin Observatory as well as the Euclid and Roman Space Telescopes. A key process that can redistribute matter across a large range of scales is feedback from accreting supermassive black holes. How exactly these active galactic nuclei (AGN) operate from sub-parsec to Mega-parsec scales however remains largely unknown. To understand this, we investigate how different AGN feedback models in the Fable simulation suite affect the cosmic evolution of the matter power spectrum (MPS). Our analysis reveals that AGN feedback significantly suppresses clustering at scales $k \sim 10\,h\,cMpc^{-1}$, with the strongest effect at redshift $z = 0$ causing a reduction of $\sim 10\%$ with respect to the dark matter-only simulation. This is due to the efficient feedback in both radio (low Eddington ratio) and quasar (high Eddington ratio) modes in our fiducial Fable model. We find that variations of the quasar and radio mode feedback with respect to the fiducial Fable model have distinct effects on the MPS red
While long-context large language models (LLMs) can technically summarize book-length documents (>100K tokens), the length and complexity of the documents have so far prohibited evaluations of input-dependent aspects like faithfulness. In this paper, we conduct the first large-scale human evaluation of faithfulness and content selection on LLM-generated summaries of fictional books. Our study mitigates the issue of data contamination by focusing on summaries of books published in 2023 or 2024, and we hire annotators who have fully read each book prior to the annotation task to minimize cost and cognitive burden. We collect FABLES, a dataset of annotations on 3,158 claims made in LLM-generated summaries of 26 books, at a cost of $5.2K USD, which allows us to rank LLM summarizers based on faithfulness: Claude-3-Opus significantly outperforms all closed-source LLMs, while the open-source Mixtral is on par with GPT-3.5-Turbo. An analysis of the annotations reveals that most unfaithful claims relate to events and character states, and they generally require indirect reasoning over the narrative to invalidate. While LLM-based auto-raters have proven reliable for factuality and coheren
Block-encodings of matrices have become an essential element of quantum algorithms derived from the quantum singular value transformation. This includes a variety of algorithms ranging from the quantum linear systems problem to quantum walk, Hamiltonian simulation, and quantum machine learning. Many of these algorithms achieve optimal complexity in terms of black box matrix oracle queries, but so far the problem of computing quantum circuit implementations for block-encodings of matrices has been under-appreciated. In this paper we propose FABLE, a method to generate approximate quantum circuits for block-encodings of matrices in a fast manner. FABLE circuits have a simple structure and are directly formulated in terms of one- and two-qubit gates. For small and structured matrices they are feasible in the NISQ era, and the circuit parameters can be easily generated for problems up to fifteen qubits. Furthermore, we show that FABLE circuits can be compressed and sparsified. We provide a compression theorem that relates the compression threshold to the error on the block-encoding. We benchmark our method for Heisenberg and Hubbard Hamiltonians, and Laplacian operators to illustrate t
We study the gas and stellar mass content of galaxy groups and clusters in the FABLE suite of cosmological hydrodynamical simulations, including the evolution of their central brightest cluster galaxies (BCGs), satellite galaxies and intracluster light (ICL). The total gas and stellar mass of FABLE clusters are in very good agreement with observations and show negligible redshift evolution at fixed halo mass for $M_{500} \gtrsim 3 \times 10^{14} M_{\odot}$ at $z \lesssim 1$, in line with recent findings from Sunyaev-Zel'dovich (SZ)-selected cluster samples. Importantly, the simulations predict significant redshift evolution in these quantities in the low mass ($M_{500} \sim 10^{14} M_{\odot}$) regime, which will be testable with upcoming SZ surveys such as SPT-3G. While the stellar masses of FABLE BCGs are in reasonable agreement with observations, the total stellar mass in satellite galaxies is lower than observed and the total mass in ICL is somewhat higher. This may be caused by enhanced tidal stripping of satellite galaxies due to their large sizes. BCGs are characterised by moderate stellar mass growth at $z < 1$ coincident with a late-time development of the ICL. The level
Corvids, apes, and children solve The Crow and The Pitcher task (from Aesop's Fables) indicating a causal understanding of the task. By cumulatively interacting with different objects, how can cognitive agents abstract the underlying cause-effect relations to predict affordances of novel objects? We address this question by re-enacting the Aesop's Fable task on a robot and present a) a brain-guided neural model of semantic-episodic memory; with b) four task-agnostic learning rules that compare expectations from recalled past episodes with the current scenario to progressively extract the hidden causal relations. The ensuing robot behaviours illustrate causal learning; and predictions for novel objects converge to Archimedes' principle, independent of both the objects explored during learning and the order of their cumulative exploration.
We study the redshift evolution of the X-ray and Sunyaev-Zel'dovich (SZ) scaling relations for galaxy groups and clusters in the FABLE suite of cosmological hydrodynamical simulations. Using an expanded sample of $27$ high-resolution zoom-in simulations, together with a uniformly-sampled cosmological volume to sample low-mass systems, we find very good agreement with the majority of observational constraints up to $z \sim 1$. We predict significant deviations of all examined scaling relations from the simple self-similar expectations. While the slopes are approximately independent of redshift, the normalisations evolve positively with respect to self-similarity, even for commonly-used mass proxies such as the $Y_{\mathrm{X}}$ parameter. These deviations are due to a combination of factors, including more effective AGN feedback in lower mass haloes, larger binding energy of gas at a given halo mass at higher redshifts and larger non-thermal pressure support from kinetic motions at higher redshifts. Our results have important implications for cluster cosmology from upcoming SZ surveys such as SPT-3G, ACTpol and CMB-S4, as relatively small changes in the observable--mass scaling relat
In this chapter we give an overview of the application of complex network theory to quantify some properties of language. Our study is based on two fables in Ukrainian, Mykyta the Fox and Abu-Kasym's slippers. It consists of two parts: the analysis of frequency-rank distributions of words and the application of complex-network theory. The first part shows that the text sizes are sufficiently large to observe statistical properties. This supports their selection for the analysis of typical properties of the language networks in the second part of the chapter. In describing language as a complex network, while words are usually associated with nodes, there is more variability in the choice of links and different representations result in different networks. Here, we examine a number of such representations of the language network and perform a comparative analysis of their characteristics. Our results suggest that, irrespective of link representation, the Ukrainian language network used in the selected fables is a strongly correlated, scale-free, small world. We discuss how such empirical approaches may help form a useful basis for a theoretical description of language evolution and
Contrary to the standard lore, there is mounting observational evidence that feedback from active galactic nuclei (AGN) may also play a role at the low-mass end of the galaxy population. We investigate this using the cosmological simulation suite FABLE, with a particular focus on the dwarf regime ($M_\mathrm{stellar} < 10^{9.5} \ \mathrm{M_{\odot}}$). We find that overmassive black holes (BHs), with respect to the mean scaling relations with their host galaxies, drive hotter and faster outflows and lead to significantly reduced gas mass fractions. They are also more likely to display a kinematically misaligned ionized gas component in our mock MaNGA velocity maps, although we caution that cosmic inflows and mergers contribute to misalignments as well. While in the local Universe the majority of AGN in dwarfs are much dimmer than the stellar component, for $z \geq 2$ there is a significant population that outshines their hosts. These high-redshift overmassive BHs contribute to the quenching of dwarfs, whereas at late cosmic times supernova (SN) feedback is more efficient. While our results are overall in good agreement with X-ray observations of AGN in dwarfs, the lack of high-lu
Flux pumping was achieved in recent hybrid scenario experiments in the ASDEX Upgrade (AUG) tokamak, which is characterized by a sawtooth-free helical quiescent state and the anomalous radial redistribution of toroidal current density and poloidal magnetic flux. In this article, the self-regulation mechanism of the AUG core plasma during flux pumping is investigated at realistic parameters using the JOREK code based on the two-temperature, nonlinear, full magnetohydrodynamic (MHD) model. A key milestone in AUG flux pumping modelling is achieved by quantitatively reproducing the clamped current density and safety factor profiles in the plasma core, demonstrating the effectiveness of the dynamo effect in sustaining the flux pumping state. The dynamo term, that is of particular interest, is primarily generated by the pressure-gradient driven m/n = 1/1 quasi-interchange-like MHD instability. The work systematically extrapolates the parameter regimes of flux pumping from the above AUG base case by scanning dissipation coefficients and plasma beta. The simulation results reveal bifurcated plasma behaviours at different Hartmann numbers, including distinct states such as flux pumping (heli
This work investigates toroidal momentum transport in type-I ELMy H-mode plasmas in the ASDEX Upgrade tokamak, focusing on the formation of hollow rotation profiles under strong electron cyclotron resonance heating (ECRH). Applying the established momentum transport analysis framework to a neutral beam injection (NBI) modulation experiment, momentum transport coefficients were inferred self-consistently. This was done for phases with dominant NBI heating and with additional strong ECRH, during which the rotation profile severely collapsed without significant changes in the externally applied torque. The experimental rotation profiles were accurately reproduced, confirming the robustness of the inferred diffusive, convective, and residual-stress contributions. While the Prandtl number and inward Coriolis pinch remained comparable between phases, the NBI+ECRH phase exhibited a strong counter-current intrinsic torque. Linear gyrokinetic simulations indicate a transition from ion-temperature-gradient (ITG) turbulence to an ITG-trapped-electron-mode (TEM) mixed regime under strong ECRH, consistent with the observed counter-current intrinsic torque and particle pinch behavior. Additional
In this article, we study the production of Hydrogen and Helium isotopes in heavy-ion collisions in the incident energy range between 80 and 150 MeV/nucleon. We compare their inclusive multiplicities emitted in the transverse plane of the reaction with the predictions given by the thermal model. As a first step, we validate the choice of this approach to describe the experimental measurements. We also show that the transient states have to be explicitly taken into account for a good statistical description of the experimental multiplicities. From the thermodynamical parameter values obtained we complete the existing database built with the use of thermal-statistical models to reproduce particle production in the (ultra-)relativistic-energy measurements. We then proposed a new constraint on the so-called freeze-out region in the temperature (T) versus baryonic chemical potential (muB) phase diagram of the quantum chromodynamics. These new results indicate that there is a common framework to describe the hadron production and nuclear clustering processes in heavy-ion collisions.
In a shattered pellet injection (SPI) system the penetration and assimilation of the injected material depends on the speed and size distribution of the SPI fragments. ASDEX Upgrade (AUG) was recently equipped with a flexible SPI to study the effect of these parameters on disruption mitigation efficiency. In this paper we study the impact of different parameters on SPI assimilation with the 1.5D INDEX code. Scans of fragment sizes, speeds and different pellet compositions are carried out for single SPI into AUG H-mode plasmas. We use a semi-empirical thermal quench (TQ) onset condition to study the material assimilation trends. For mixed deuterium-neon pellets, smaller/faster fragments start to assimilate quicker. However, at the expected onset of the global reconnection event (GRE),larger/faster fragments end up assimilating more material. Variations in the injected neon content lead to a large difference in the assimilated neon for neon content below $< 10^{21}$ atoms. For larger injected neon content, a self-regulating mechanism limits the variation in the amount of assimilated neon. We use a back-averaging model to simulate the plasmoid drift during pure deuterium injections
In this paper an extensive database of SPARC H-modes confinement predictions has been provided, to assess its variability with respect to few input assumptions. The simulations have been performed within the ASTRA framework, using the quasi-linear model TGLF SAT2, including electromagnetic effects, for the core transport, and a neural network trained on EPED simulations to predict the pedestal height and width self-consistently. The database has been developed starting from two SPARC H-mode discharges (12.2 T, i.e. Primary Reference Discharge or PRD, and 8 T, i.e. reduced field) and permuting 4 input parameters (W concentration, DT mixture concentration, temperature ratio at top of pedestal and deviation of pedestal pressure from the EPED prediction), to perform a sensitivity study. For the PRD a scan of auxiliary input power (ion cyclotron heating) has been performed up to 25MW, to keep highly radiative plasmas above the LH power threshold as predicted by Martin and Schmidtmayr power scalings. A scan of pedestal density has then been performed for both PRD and 8T databases. ptop/pEPED and Ti/Te at top of pedestal showed the biggest impact on the fusion gain. Significant variation
In this work, we present the first global gyrokinetic simulations of the ITER baseline scenario operating at 15 MA using GENE-Tango electrostatic and electromagnetic simulations. The modeled radial region spans close to the magnetic axis up to rho_tor = 0.6. Our results show a pronounced density peaking, moderated by electromagnetic fluctuations. The predicted fusion gain for this scenario is Q = 12.2, aligning well with ITER's mission objectives. We further characterize the turbulence spectra and find that electromagnetic modes, such as microtearing modes, kinetic ballooning modes, and Alfvenic ion temperature gradient modes at low binormal wave numbers, play a critical role in the core transport of this ITER scenario, necessitating high numerical resolution for accurate modeling. Local flux-tube simulations qualitatively reproduce the key features observed in the global gyrokinetic simulations but exhibit a much higher sensitivity to profile gradients, reflecting increased stiffness, likely due to the linearization of the equilibrium profiles and safety factor. Our study also reveals that the imposed external toroidal rotation profiles have a negligible impact on turbulent transp