Unstructured model editing aims to update models with real-world text, yet existing methods often memorize text holistically without reliable fine-grained fact access. To address this, we propose FABLE, a hierarchical framework that decouples fine-grained fact injection from holistic text generation. FABLE follows a two-stage, fact-first strategy: discrete facts are anchored in shallow layers, followed by minimal updates to deeper layers to produce coherent text. This decoupling resolves the mismatch between holistic recall and fine-grained fact access, reflecting the unidirectional Transformer flow in which surface-form generation amplifies rather than corrects underlying fact representations. We also introduce UnFine, a diagnostic benchmark with fine-grained question-answer pairs and fact-level metrics for systematic evaluation. Experiments show that FABLE substantially improves fine-grained question answering while maintaining state-of-the-art holistic editing performance. Our code is publicly available at https://github.com/caskcsg/FABLE.
Deep learning-based weather forecasting (DLWF) models have recently demonstrated significant performance gains over gold-standard physics-based simulation tools. However, these models are potentially vulnerable to adversarial attacks, which raises concerns about their trustworthiness. In this paper, we investigate the feasibility and challenges of applying existing adversarial attack methods to DLWF models and propose a novel framework called FABLE (Forecast Alteration By Localized targeted advErsarial attack) to address them. FABLE performs a 3D discrete wavelet decomposition to disentangle the spatial and temporal components of the data. By regulating the magnitude of adversarial perturbations across different components, FABLE produces adversarial inputs that remain closely aligned with the original inputs while steering the DLWF models toward generating the targeted forecast outcomes. Experimental results on real-world weather datasets demonstrate the effectiveness of FABLE over baseline methods across various metrics.
Moral stories are a time-tested vehicle for transmitting values, yet modern NLP lacks a large, structured corpus that couples coherent narratives with explicit ethical lessons. We present TF1-EN-3M, to our knowledge the first open dataset of three million English-language fables generated exclusively by instruction-tuned models no larger than 8B parameters. Each story follows a six-slot scaffold (character -> trait -> setting -> conflict -> resolution -> moral), produced through a combinatorial prompt engine that guarantees genre fidelity while covering a broad thematic space. A fully reproducible evaluation pipeline employs a panel of open-weight LLM judges from distinct model families, scoring grammar, creativity, moral clarity, and template adherence, complemented by reference-free diversity and readability metrics. Among ten open-weight generator candidates, an 8B-parameter Llama-3 variant delivers the best quality-cost trade-off, producing high-scoring fables on consumer hardware at approximately $0.135 per 1,000 fables. We release the dataset, generation code, evaluation scripts, and full metadata under a permissive license, enabling exact reproducibility and c
The Fast Approximate BLock-Encoding algorithm (FABLE) is a technique to block-encode arbitrary $N\times N$ dense matrices into quantum circuits using at most $O(N^2)$ one and two-qubit gates and $\mathcal{O}(N^2\log{N})$ classical operations. The method nontrivially transforms a matrix $A$ into a collection of angles to be implemented in a sequence of $y$-rotation gates within the block-encoding circuit. If an angle falls below a threshold value, its corresponding rotation gate may be eliminated without significantly impacting the accuracy of the encoding. Ideally many of these rotation gates may be eliminated at little cost to the accuracy of the block-encoding such that quantum resources are minimized. In this paper we describe two modifications of FABLE to efficiently encode sparse matrices; in the first method termed Sparse-FABLE (S-FABLE), for a generic unstructured sparse matrix $A$ we use FABLE to block encode the Hadamard-conjugated matrix $H^{\otimes n}AH^{\otimes n}$ (computed with $\mathcal{O}(N^2\log N)$ classical operations) and conjugate the resulting circuit with $n$ extra Hadamard gates on each side to reclaim a block-approximation to $A$. We demonstrate that the FA
Understanding how data moves, transforms, and persists, known as data flow, is fundamental to reasoning in procedural tasks. Despite their fluency in natural and programming languages, large language models (LLMs), although increasingly being applied to decisions with procedural tasks, have not been systematically evaluated for their ability to perform data-flow reasoning. We introduce FABLE, an extensible benchmark designed to assess LLMs' understanding of data flow using structured, procedural text. FABLE adapts eight classical data-flow analyses from software engineering: reaching definitions, very busy expressions, available expressions, live variable analysis, interval analysis, type-state analysis, taint analysis, and concurrency analysis. These analyses are instantiated across three real-world domains: cooking recipes, travel routes, and automated plans. The benchmark includes 2,400 question-answer pairs, with 100 examples for each domain-analysis combination. We evaluate three types of LLMs: a reasoning-focused model (DeepSeek-R1 8B), a general-purpose model (LLaMA 3.1 8B), and a code-specific model (Granite Code 8B). Each model is tested using majority voting over five sam
As LLMs excel on standard reading comprehension benchmarks, attention is shifting toward evaluating their capacity for complex abstract reasoning and inference. Literature-based benchmarks, with their rich narrative and moral depth, provide a compelling framework for evaluating such deeper comprehension skills. Here, we present MORABLES, a human-verified benchmark built from fables and short stories drawn from historical literature. The main task is structured as multiple-choice questions targeting moral inference, with carefully crafted distractors that challenge models to go beyond shallow, extractive question answering. To further stress-test model robustness, we introduce adversarial variants designed to surface LLM vulnerabilities and shortcuts due to issues such as data contamination. Our findings show that, while larger models outperform smaller ones, they remain susceptible to adversarial manipulation and often rely on superficial patterns rather than true moral reasoning. This brittleness results in significant self-contradiction, with the best models refuting their own answers in roughly 20% of cases depending on the framing of the moral choice. Interestingly, reasoning-e
Understanding the impact of baryonic physics on cosmic structure formation is crucial for accurate cosmological predictions, especially as we usher in the era of large galaxy surveys with the Rubin Observatory as well as the Euclid and Roman Space Telescopes. A key process that can redistribute matter across a large range of scales is feedback from accreting supermassive black holes. How exactly these active galactic nuclei (AGN) operate from sub-parsec to Mega-parsec scales however remains largely unknown. To understand this, we investigate how different AGN feedback models in the Fable simulation suite affect the cosmic evolution of the matter power spectrum (MPS). Our analysis reveals that AGN feedback significantly suppresses clustering at scales $k \sim 10\,h\,cMpc^{-1}$, with the strongest effect at redshift $z = 0$ causing a reduction of $\sim 10\%$ with respect to the dark matter-only simulation. This is due to the efficient feedback in both radio (low Eddington ratio) and quasar (high Eddington ratio) modes in our fiducial Fable model. We find that variations of the quasar and radio mode feedback with respect to the fiducial Fable model have distinct effects on the MPS red
While long-context large language models (LLMs) can technically summarize book-length documents (>100K tokens), the length and complexity of the documents have so far prohibited evaluations of input-dependent aspects like faithfulness. In this paper, we conduct the first large-scale human evaluation of faithfulness and content selection on LLM-generated summaries of fictional books. Our study mitigates the issue of data contamination by focusing on summaries of books published in 2023 or 2024, and we hire annotators who have fully read each book prior to the annotation task to minimize cost and cognitive burden. We collect FABLES, a dataset of annotations on 3,158 claims made in LLM-generated summaries of 26 books, at a cost of $5.2K USD, which allows us to rank LLM summarizers based on faithfulness: Claude-3-Opus significantly outperforms all closed-source LLMs, while the open-source Mixtral is on par with GPT-3.5-Turbo. An analysis of the annotations reveals that most unfaithful claims relate to events and character states, and they generally require indirect reasoning over the narrative to invalidate. While LLM-based auto-raters have proven reliable for factuality and coheren
Block-encodings of matrices have become an essential element of quantum algorithms derived from the quantum singular value transformation. This includes a variety of algorithms ranging from the quantum linear systems problem to quantum walk, Hamiltonian simulation, and quantum machine learning. Many of these algorithms achieve optimal complexity in terms of black box matrix oracle queries, but so far the problem of computing quantum circuit implementations for block-encodings of matrices has been under-appreciated. In this paper we propose FABLE, a method to generate approximate quantum circuits for block-encodings of matrices in a fast manner. FABLE circuits have a simple structure and are directly formulated in terms of one- and two-qubit gates. For small and structured matrices they are feasible in the NISQ era, and the circuit parameters can be easily generated for problems up to fifteen qubits. Furthermore, we show that FABLE circuits can be compressed and sparsified. We provide a compression theorem that relates the compression threshold to the error on the block-encoding. We benchmark our method for Heisenberg and Hubbard Hamiltonians, and Laplacian operators to illustrate t
We study the gas and stellar mass content of galaxy groups and clusters in the FABLE suite of cosmological hydrodynamical simulations, including the evolution of their central brightest cluster galaxies (BCGs), satellite galaxies and intracluster light (ICL). The total gas and stellar mass of FABLE clusters are in very good agreement with observations and show negligible redshift evolution at fixed halo mass for $M_{500} \gtrsim 3 \times 10^{14} M_{\odot}$ at $z \lesssim 1$, in line with recent findings from Sunyaev-Zel'dovich (SZ)-selected cluster samples. Importantly, the simulations predict significant redshift evolution in these quantities in the low mass ($M_{500} \sim 10^{14} M_{\odot}$) regime, which will be testable with upcoming SZ surveys such as SPT-3G. While the stellar masses of FABLE BCGs are in reasonable agreement with observations, the total stellar mass in satellite galaxies is lower than observed and the total mass in ICL is somewhat higher. This may be caused by enhanced tidal stripping of satellite galaxies due to their large sizes. BCGs are characterised by moderate stellar mass growth at $z < 1$ coincident with a late-time development of the ICL. The level
In this chapter we give an overview of the application of complex network theory to quantify some properties of language. Our study is based on two fables in Ukrainian, Mykyta the Fox and Abu-Kasym's slippers. It consists of two parts: the analysis of frequency-rank distributions of words and the application of complex-network theory. The first part shows that the text sizes are sufficiently large to observe statistical properties. This supports their selection for the analysis of typical properties of the language networks in the second part of the chapter. In describing language as a complex network, while words are usually associated with nodes, there is more variability in the choice of links and different representations result in different networks. Here, we examine a number of such representations of the language network and perform a comparative analysis of their characteristics. Our results suggest that, irrespective of link representation, the Ukrainian language network used in the selected fables is a strongly correlated, scale-free, small world. We discuss how such empirical approaches may help form a useful basis for a theoretical description of language evolution and
The use of galaxy clusters as cosmological probes often relies on understanding the properties and evolution of the intracluster medium (ICM). However, the ICM is a complex plasma, regularly stirred by mergers and feedback, with non-negligible bulk and turbulent motions and a non-thermal pressure component, making it difficult to construct a coherent and comprehensive picture. To this end, we use the FABLE simulations to investigate how the hydrostatic mass bias is affected by mergers, turbulence, and feedback. Following in detail a single, massive cluster we find the bias varies significantly over cosmic time, rarely staying at the average value found at a particular epoch. Variations of the bias at a given radius are contemporaneous with periods where outflows dominate the mass flux, either due to mergers or interestingly, at high redshift, AGN feedback. The $z=0$ ensemble median mass bias in FABLE is $\sim\!13$ per cent at $R_\mathrm{500}$ and $\sim\!15$ per cent at $R_\mathrm{200}$, but with a large scatter in individual values. In halo central regions, we see an increase in temperature and a decrease in non-thermal pressure support with cosmic time as turbulence thermalises, l
We study the redshift evolution of the X-ray and Sunyaev-Zel'dovich (SZ) scaling relations for galaxy groups and clusters in the FABLE suite of cosmological hydrodynamical simulations. Using an expanded sample of $27$ high-resolution zoom-in simulations, together with a uniformly-sampled cosmological volume to sample low-mass systems, we find very good agreement with the majority of observational constraints up to $z \sim 1$. We predict significant deviations of all examined scaling relations from the simple self-similar expectations. While the slopes are approximately independent of redshift, the normalisations evolve positively with respect to self-similarity, even for commonly-used mass proxies such as the $Y_{\mathrm{X}}$ parameter. These deviations are due to a combination of factors, including more effective AGN feedback in lower mass haloes, larger binding energy of gas at a given halo mass at higher redshifts and larger non-thermal pressure support from kinetic motions at higher redshifts. Our results have important implications for cluster cosmology from upcoming SZ surveys such as SPT-3G, ACTpol and CMB-S4, as relatively small changes in the observable--mass scaling relat
We present the Feedback Acting on Baryons in Large-scale Environments (FABLE) suite of cosmological hydrodynamical simulations of galaxies, groups and clusters. The simulations use the AREPO moving-mesh code with a set of physical models for galaxy formation based on the successful Illustris simulation, but with updated AGN and supernovae feedback models. This allows us to simultaneously reproduce the observed redshift evolution of the galaxy stellar mass function together with the stellar and gas mass fractions of local groups and clusters across a wide range of halo masses. Focusing on the properties of groups and clusters, we find very good agreement with a range of observed scaling relations, including the X-ray luminosity--total mass and gas mass relations as well as the total mass--temperature and Sunyaev-Zel'dovich flux--mass relations. Careful comparison of our results with scaling relations based on X-ray hydrostatic masses as opposed to weak lensing-derived masses reveals some discrepancies, which hint towards a non-negligible X-ray mass bias in observed samples. We further show that radial profiles of density, pressure and temperature of the simulated intracluster medium
Flux pumping was achieved in recent hybrid scenario experiments in the ASDEX Upgrade (AUG) tokamak, which is characterized by a sawtooth-free helical quiescent state and the anomalous radial redistribution of toroidal current density and poloidal magnetic flux. In this article, the self-regulation mechanism of the AUG core plasma during flux pumping is investigated at realistic parameters using the JOREK code based on the two-temperature, nonlinear, full magnetohydrodynamic (MHD) model. A key milestone in AUG flux pumping modelling is achieved by quantitatively reproducing the clamped current density and safety factor profiles in the plasma core, demonstrating the effectiveness of the dynamo effect in sustaining the flux pumping state. The dynamo term, that is of particular interest, is primarily generated by the pressure-gradient driven m/n = 1/1 quasi-interchange-like MHD instability. The work systematically extrapolates the parameter regimes of flux pumping from the above AUG base case by scanning dissipation coefficients and plasma beta. The simulation results reveal bifurcated plasma behaviours at different Hartmann numbers, including distinct states such as flux pumping (heli
This work investigates toroidal momentum transport in type-I ELMy H-mode plasmas in the ASDEX Upgrade tokamak, focusing on the formation of hollow rotation profiles under strong electron cyclotron resonance heating (ECRH). Applying the established momentum transport analysis framework to a neutral beam injection (NBI) modulation experiment, momentum transport coefficients were inferred self-consistently. This was done for phases with dominant NBI heating and with additional strong ECRH, during which the rotation profile severely collapsed without significant changes in the externally applied torque. The experimental rotation profiles were accurately reproduced, confirming the robustness of the inferred diffusive, convective, and residual-stress contributions. While the Prandtl number and inward Coriolis pinch remained comparable between phases, the NBI+ECRH phase exhibited a strong counter-current intrinsic torque. Linear gyrokinetic simulations indicate a transition from ion-temperature-gradient (ITG) turbulence to an ITG-trapped-electron-mode (TEM) mixed regime under strong ECRH, consistent with the observed counter-current intrinsic torque and particle pinch behavior. Additional
In this article, we study the production of Hydrogen and Helium isotopes in heavy-ion collisions in the incident energy range between 80 and 150 MeV/nucleon. We compare their inclusive multiplicities emitted in the transverse plane of the reaction with the predictions given by the thermal model. As a first step, we validate the choice of this approach to describe the experimental measurements. We also show that the transient states have to be explicitly taken into account for a good statistical description of the experimental multiplicities. From the thermodynamical parameter values obtained we complete the existing database built with the use of thermal-statistical models to reproduce particle production in the (ultra-)relativistic-energy measurements. We then proposed a new constraint on the so-called freeze-out region in the temperature (T) versus baryonic chemical potential (muB) phase diagram of the quantum chromodynamics. These new results indicate that there is a common framework to describe the hadron production and nuclear clustering processes in heavy-ion collisions.
In a shattered pellet injection (SPI) system the penetration and assimilation of the injected material depends on the speed and size distribution of the SPI fragments. ASDEX Upgrade (AUG) was recently equipped with a flexible SPI to study the effect of these parameters on disruption mitigation efficiency. In this paper we study the impact of different parameters on SPI assimilation with the 1.5D INDEX code. Scans of fragment sizes, speeds and different pellet compositions are carried out for single SPI into AUG H-mode plasmas. We use a semi-empirical thermal quench (TQ) onset condition to study the material assimilation trends. For mixed deuterium-neon pellets, smaller/faster fragments start to assimilate quicker. However, at the expected onset of the global reconnection event (GRE),larger/faster fragments end up assimilating more material. Variations in the injected neon content lead to a large difference in the assimilated neon for neon content below $< 10^{21}$ atoms. For larger injected neon content, a self-regulating mechanism limits the variation in the amount of assimilated neon. We use a back-averaging model to simulate the plasmoid drift during pure deuterium injections
In this paper an extensive database of SPARC H-modes confinement predictions has been provided, to assess its variability with respect to few input assumptions. The simulations have been performed within the ASTRA framework, using the quasi-linear model TGLF SAT2, including electromagnetic effects, for the core transport, and a neural network trained on EPED simulations to predict the pedestal height and width self-consistently. The database has been developed starting from two SPARC H-mode discharges (12.2 T, i.e. Primary Reference Discharge or PRD, and 8 T, i.e. reduced field) and permuting 4 input parameters (W concentration, DT mixture concentration, temperature ratio at top of pedestal and deviation of pedestal pressure from the EPED prediction), to perform a sensitivity study. For the PRD a scan of auxiliary input power (ion cyclotron heating) has been performed up to 25MW, to keep highly radiative plasmas above the LH power threshold as predicted by Martin and Schmidtmayr power scalings. A scan of pedestal density has then been performed for both PRD and 8T databases. ptop/pEPED and Ti/Te at top of pedestal showed the biggest impact on the fusion gain. Significant variation
This work investigates the potential of undermining both fairness and detection performance in abusive language detection. In a dynamic and complex digital world, it is crucial to investigate the vulnerabilities of these detection models to adversarial fairness attacks to improve their fairness robustness. We propose a simple yet effective framework FABLE that leverages backdoor attacks as they allow targeted control over the fairness and detection performance. FABLE explores three types of trigger designs (i.e., rare, artificial, and natural triggers) and novel sampling strategies. Specifically, the adversary can inject triggers into samples in the minority group with the favored outcome (i.e., "non-abusive") and flip their labels to the unfavored outcome, i.e., "abusive". Experiments on benchmark datasets demonstrate the effectiveness of FABLE attacking fairness and utility in abusive language detection.