Twisted bilayer systems host a wealth of emergent phenomena, such as flat-band superconductivity, ferromagnetism, and ferroelectricity, arising from moiré superlattices and unconventional interlayer coupling. Despite their central role, direct and quantitative access to the out-of-plane atomic structure in these systems has remained elusive due to their nanoscale dimensions. Here, we introduce an automated dark-field electron tomography technique that enables three-dimensional structural analysis of atomically thin materials with sub-angstrom precision. Applying this method to twisted bilayer WSe$_2$, we uncover a significant expansion of the interlayer spacing compared to the bulk configuration, exceeding 0.1 angstrom, along with a remarkable temperature-driven interlayer decoupling unique to the twisted bilayer. Ultrafast measurement further reveals optically induced interlayer separation of ~0.2 angstrom on the picosecond timescale, attributed to transient exciton formation. These findings not only establish a powerful approach for visualizing hidden out-of-plane structures in atomically thin micro-flake materials, but also uncover the intrinsic fragility and dynamical tunabilit
Empirical measures of financial connectedness based on Forecast Error Variance Decompositions (FEVDs) often yield dense network structures that obscure true transmission channels and complicate the identification of systemic risk. This paper proposes a novel information-criterion-based approach to uncover sparse, economically meaningful financial networks. By reformulating FEVD-based connectedness as a regression problem, we develop a model selection framework that consistently recovers the active set of spillover channels. We extend this method to generalized FEVDs to accommodate correlated shocks and introduce a data-driven procedure for tuning the penalty parameter using pseudo-out-of-sample forecast performance. Monte Carlo simulations demonstrate the approach's effectiveness with finite samples and its robustness to approximately sparse networks and heavy-tailed errors. Applications to global stock markets, S&P 500 sectoral indices, and commodity futures highlight the prevalence of sparse networks in empirical settings.
While large language models (LLMs) are increasingly being explored for mental health applications, recent studies reveal that they can exhibit stigma toward individuals with psychological conditions. Existing evaluations of this stigma primarily rely on multiple-choice questions (MCQs), which fail to capture the biases embedded within the models' underlying logic. In this paper, we analyze the intermediate reasoning steps of LLMs to uncover hidden stigmatizing language and the internal rationales driving it. We leverage clinical expertise to categorize common patterns of stigmatizing language directed at individuals with psychological conditions and use this framework to identify and tag problematic statements in LLM reasoning. Furthermore, we rate the severity of these statements, distinguishing between overt prejudice and more subtle, less immediately harmful biases. To broaden the reasoning domain and capture a wider array of patterns, we also extend an existing mental health stigma benchmark by incorporating additional psychological conditions. Our findings demonstrate that evaluating model reasoning not only exposes substantially more stigma than traditional MCQ-based methods
Humans often rely on subjective natural language to direct language models (LLMs); for example, users might instruct the LLM to write an enthusiastic blogpost, while developers might train models to be helpful and harmless using LLM-based edits. The LLM's operational semantics of such subjective phrases -- how it adjusts its behavior when each phrase is included in the prompt -- thus dictates how aligned it is with human intent. In this work, we uncover instances of misalignment between LLMs' actual operational semantics and what humans expect. Our method, TED (thesaurus error detector), first constructs a thesaurus that captures whether two phrases have similar operational semantics according to the LLM. It then elicits failures by unearthing disagreements between this thesaurus and a human-constructed reference. TED routinely produces surprising instances of misalignment; for example, Mistral 7B Instruct produces more harassing outputs when it edits text to be witty, and Llama 3 8B Instruct produces dishonest articles when instructed to make the articles enthusiastic. Our results demonstrate that humans can uncover unexpected LLM behavior by scrutinizing relationships between abs
The rapid proliferation of diverse programming languages presents both opportunities and challenges for developing multilingual code LLMs. While existing techniques often train code LLMs by simply aggregating multilingual code data, few explore the deeper relationships between programming languages(PLs) and how such relationships can be utilized to optimize the training and inference of code LLMs. In this work, we investigate 2 fundamental questions: 1) What are the deep linguistic relationships among PLs? and 2) How can these relationships be leveraged to improve multilingual code LLMs? We propose an embedding-based framework to uncover the latent families of PLs. Our approach begins by defining 21 primary linguistic features of programming languages, such as variable definition, control structures, and method declarations, and then employs LLMs to generate feature-aligned code samples across multiple languages. By embedding these semantically parallel code snippets from 19 languages, we construct a similarity matrix and perform hierarchical clustering to uncover inherent language relationships. Our analysis reveals clear hierarchical structures among programming languages. Closel
The evaluation of large language models relies heavily on standardized benchmarks. These benchmarks provide useful aggregated metrics, but can obscure (i) particular sub-areas where the models are weak ("model gaps") and (ii) imbalanced coverage in the benchmarks themselves ("benchmark gaps"). To automatically uncover both types of gaps, we propose a simple new method using concept activations from sparse autoencoders, to identify fine-grained gaps on a per-concept basis. The method also benefits from grounding evaluation in the model's internal representations, as well as easy comparison across benchmarks. We applied the method to five popular open-source models and more than a dozen benchmarks, as illustrative examples. As validation of the approach, we found that our automatic, unsupervised method was able to recover model gaps that have been previously documented in the literature (e.g. relating to sycophancy), in addition to identifying novel model gaps. We were also able to automatically uncover benchmark gaps: core concepts that should fall within the scope of a given benchmark. Our "competency gaps" method can be used to complement existing benchmarks, by providing a concep
Fully Test-Time Adaptation (FTTA) addresses domain shifts without access to source data and training protocols of the pre-trained models. Traditional strategies that align source and target feature distributions are infeasible in FTTA due to the absence of training data and unpredictable target domains. In this work, we exploit a dual perspective on FTTA, and propose Agnostic FTTA (AFTTA) as a novel formulation that enables the usage of off-the-shelf domain transformations during test-time to enable direct generalization to unforeseeable target data. To address this, we develop an uncover-and-unlearn approach. First, we uncover potential unwanted shifts between source and target domains by simulating them through predefined mappings and consider them as nuisances. Then, during test-time prediction, the model is enforced to unlearn these nuisances by regularizing the consequent shifts in latent representations and label predictions. Specifically, a mutual information-based criterion is devised and applied to guide nuisances unlearning in the feature space and encourage confident and consistent prediction in label space. Our proposed approach explicitly addresses agnostic domain shif
Large Language Models (LLMs) are large-scale pretrained models that have achieved remarkable success across diverse domains. These successes have been driven by unprecedented complexity and scale in both data and computations. However, due to the high costs of training such models, brute-force trial-and-error approaches to improve LLMs are not feasible. Inspired by the success of inverse problems in uncovering fundamental scientific laws, this position paper advocates that inverse problems can also efficiently uncover scaling laws that guide the building of LLMs to achieve the desirable performance with significantly better cost-effectiveness.
We introduce a dataset of concept learning tasks that helps uncover implicit biases in large language models. Using in-context concept learning experiments, we found that language models may have a bias toward upward monotonicity in quantifiers; such bias is less apparent when the model is tested by direct prompting without concept learning components. This demonstrates that in-context concept learning can be an effective way to discover hidden biases in language models.
The recent release of the Apple Vision Pro has reignited interest in the metaverse, showcasing the intensified efforts of technology giants in developing platforms and devices to facilitate its growth. As the metaverse continues to proliferate, it is foreseeable that everyday environments will become increasingly saturated with its presence. Consequently, uncovering links to these metaverse items will be a crucial first step to interacting with this new augmented world. In this paper, we address the problem of establishing connections with virtual worlds within everyday environments, especially those that are not readily discernible through direct visual inspection. We introduce a vision-based approach leveraging Artcode visual markers to uncover hidden metaverse links embedded in our ambient surroundings. This approach progressively localises the access points to the metaverse, transitioning from coarse to fine localisation, thus facilitating an exploratory interaction process. Detailed experiments are conducted to study the performance of the proposed approach, demonstrating its effectiveness in Artcode localisation and enabling new interaction opportunities.
We consider the process of uncovering the vertices of a random labeled tree according to their labels. First, a labeled tree with $n$ vertices is generated uniformly at random. Thereafter, the vertices are uncovered one by one, in order of their labels. With each new vertex, all edges to previously uncovered vertices are uncovered as well. In this way, one obtains a growing sequence of forests. Three particular aspects of this process are studied in this work: first the number of edges, which we prove to converge to a stochastic process akin to a Brownian bridge after appropriate rescaling. Second, the connected component of a fixed vertex, for which different phases are identified and limiting distributions determined in each phase. Lastly, the largest connected component, for which we also observe a phase transition.
Machine learning is a rapidly evolving field with a wide range of applications, including biological signal analysis, where novel algorithms often improve the state-of-the-art. However, robustness to algorithmic variability - measured by different algorithms, consistently uncovering similar findings - is seldom explored. In this paper we investigate whether established hypotheses in brain-age prediction from EEG research validate across algorithms. First, we surveyed literature and identified various features known to be informative for brain-age prediction. We employed diverse feature extraction techniques, processing steps, and models, and utilized the interpretative power of SHapley Additive exPlanations (SHAP) values to align our findings with the existing research in the field. Few of our models achieved state-of-the-art performance on the specific data-set we utilized. Moreover, analysis demonstrated that while most models do uncover similar patterns in the EEG signals, some variability could still be observed. Finally, a few prominent findings could only be validated using specific models. We conclude by suggesting remedies to the potential implications of this lack of robus
Animals chain movements into long-lived motor strategies, exhibiting variability across scales that reflects the interplay between internal states and environmental cues. To reveal structure in such variability, we build Markov models of movement sequences that bridges across time scales and enables a quantitative comparison of behavioral phenotypes among individuals. Applied to larval zebrafish responding to diverse sensory cues, we uncover a hierarchy of long-lived motor strategies, dominated by changes in orientation distinguishing cruising versus wandering strategies. Environmental cues induce preferences along these modes at the population level: while fish cruise in the light, they wander in response to aversive stimuli, or in search for appetitive prey. As our method encodes the behavioral dynamics of each individual fish in the transitions among coarse-grained motor strategies, we use it to uncover a hierarchical structure in the phenotypic variability that reflects exploration-exploitation trade-offs. Across a wide range of sensory cues, a major source of variation among fish is driven by prior and/or immediate exposure to prey that induces exploitation phenotypes. A large
This work introduces a new Control Neural Network (Ctrl.NN) method to uncover evidence of exotic quantum state, \textit{i.e.}, the breathing modes in 3-$α$ resonant states of $^{12}$C nucleus. We provide the most precise microscopic description to date for the $^{12}$C energy spectrum, identify two new exotic breathing states, and uncover strong evidence that directly connects the recent experimental observations to the breathing modes. The Ctrl.NN method significantly simplifies numerical calculations of quantum systems under multiple constraints and offers a new perspective for solving the nuclear many-body problem.
Tools for analyzing character portrayal in fiction are valuable for writers and literary scholars in developing and interpreting compelling stories. Existing tools, such as visualization tools for analyzing fictional characters, primarily rely on explicit textual indicators of character attributes. However, portrayal is often implicit, revealed through actions and behaviors rather than explicit statements. We address this gap by leveraging large language models (LLMs) to uncover implicit character portrayals. We start by generating a dataset for this task with greater cross-topic similarity, lexical diversity, and narrative lengths than existing narrative text corpora such as TinyStories and WritingPrompts. We then introduce LIIPA (LLMs for Inferring Implicit Portrayal for Character Analysis), a framework for prompting LLMs to uncover character portrayals. LIIPA can be configured to use various types of intermediate computation (character attribute word lists, chain-of-thought) to infer how fictional characters are portrayed in the source text. We find that LIIPA outperforms existing approaches, and is more robust to increasing character counts (number of unique persons depicted) d
Load-altering attacks targetting a large number of IoT-based high-wattage devices (e.g., smart electric vehicle charging stations) can lead to serious disruptions of power grid operations. In this work, we aim to uncover spatiotemporal characteristics of LAAs that can lead to serious impact. The problem is challenging since existing protection measures such as $N-1$ security ensures that the power grid is naturally resilient to load changes. Thus, strategically injected load perturbations that lead to network failure can be regarded as \emph{rare events}. To this end, we adopt a rare-event sampling approach to uncover LAAs distributed temporally and spatially across the power network. The key advantage of this sampling method is the ability of sampling efficiently from multi-modal conditional distributions with disconnected support. Furthermore, we systematically compare the impacts of static (one-time manipulation of demand) and dynamic (attack over multiple time periods) LAAs. We perform extensive simulations using benchmark IEEE test simulations. The results show (i) the superiority and the need for rare-event sampling in the context of uncovering LAAs as compared to other sampl
High-resolution satellite-based crop yield mapping offers enormous promise for monitoring progress towards the SDGs. Across 15,000 villages in Rwanda we uncover areas that are on and off track to double productivity by 2030. This machine learning enabled analysis is used to design spatially explicit productivity targets that, if met, would simultaneously ensure national goals without leaving anyone behind.
Biases and errors in human-labeled data present significant challenges for machine learning, especially in supervised learning reliant on potentially flawed ground truth data. These flaws, including diagnostic errors and societal biases, risk being propagated and amplified through models trained using maximum likelihood estimation. We present the Reflective LLM Dialogue Framework RLDF, which leverages structured adversarial dialogues between multiple instances of a single LLM or different LLMs to uncover diverse perspectives and correct inconsistencies. By conditioning LLMs to adopt opposing stances, RLDF enables systematic bias detection through conditional statistics, information theory, and divergence metrics. Experiments show RLDF successfully identifies potential biases in public content while exposing limitations in human-labeled data. Our framework supports measurable progress tracking and explainable remediation actions, offering a scalable approach for improving content neutrality through transparent, multi-perspective analysis.
Information on natural phenomena and engineering systems is typically contained in data. Data can be corrupted by systematic errors in models and experiments. In this paper, we propose a tool to uncover the spatiotemporal solution of the underlying physical system by removing the systematic errors from data. The tool is the physics-constrained convolutional neural network (PC-CNN), which combines information from both the systems governing equations and data. We focus on fundamental phenomena that are modelled by partial differential equations, such as linear convection, Burgers equation, and two-dimensional turbulence. First, we formulate the problem, describe the physics-constrained convolutional neural network, and parameterise the systematic error. Second, we uncover the solutions from data corrupted by large multimodal systematic errors. Third, we perform a parametric study for different systematic errors. We show that the method is robust. Fourth, we analyse the physical properties of the uncovered solutions. We show that the solutions inferred from the PC-CNN are physical, in contrast to the data corrupted by systematic errors that does not fulfil the governing equations. Th
Uncovering data generative factors is the ultimate goal of disentanglement learning. Although many works proposed disentangling generative models able to uncover the underlying generative factors of a dataset, so far no one was able to uncover OOD generative factors (i.e., factors of variations that are not explicitly shown on the dataset). Moreover, the datasets used to validate these models are synthetically generated using a balanced mixture of some predefined generative factors, implicitly assuming that generative factors are uniformly distributed across the datasets. However, real datasets do not present this property. In this work we analyse the effect of using datasets with unbalanced generative factors, providing qualitative and quantitative results for widely used generative models. Moreover, we propose TC-VAE, a generative model optimized using a lower bound of the joint total correlation between the learned latent representations and the input data. We show that the proposed model is able to uncover OOD generative factors on different datasets and outperforms on average the related baselines in terms of downstream disentanglement metrics.