Alignment of large language models (LLMs) involves training models on preference-contrastive output pairs to adjust their responses according to human preferences. To obtain such contrastive pairs, traditional methods like RLHF and RLAIF rely on limited contrasting patterns, such as varying model variants or decoding temperatures. This singularity leads to two issues: (1) alignment is not comprehensive; and thereby (2) models are susceptible to jailbreaking attacks. To address these issues, we investigate how to construct more comprehensive and diversified contrasting patterns to enhance preference data (RQ1) and verify the impact of the diversification of contrasting patterns on model alignment (RQ2). For RQ1, we propose PopAlign, a framework that integrates diversified contrasting patterns across the prompt, model, and pipeline levels, introducing six contrasting strategies that do not require additional feedback labeling procedures. Regarding RQ2, we conduct thorough experiments demonstrating that PopAlign significantly outperforms existing methods, leading to more comprehensive alignment.
Language models are prone to memorizing their training data, making them vulnerable to extraction attacks. While existing research often examines isolated setups, such as a single model or a fixed prompt, real-world adversaries have a considerably larger attack surface due to access to models across various sizes and checkpoints, and repeated prompting. In this paper, we revisit extraction attacks from an adversarial perspective -- with multi-faceted access to the underlying data. We find significant churn in extraction trends, i.e., even unintuitive changes to the prompt, or targeting smaller models and earlier checkpoints, can extract distinct information. By combining multiple attacks, our adversary doubles ($2 \times$) the extraction risks, persisting even under mitigation strategies like data deduplication. We conclude with four case studies, including detecting pre-training data, copyright violations, extracting personally identifiable information, and attacking closed-source models, showing how our more realistic adversary can outperform existing adversaries in the literature.
As the number of detected gravitational wave (GW) events increases with the improved sensitivity of the observatories, detecting strongly lensed pairs of events is becoming a real possibility. Identifying such lensed pairs, however, remains challenging due to the computational cost and/or the reliance on prior knowledge of source parameters in existing methods. This study investigates a novel approach, Optimal Cross-Correlation Analysis for Multiplets (OCCAM), applied to strain data from one or more detectors for Compact Binary Coalescence (CBC) events identified by GW searches, using an optimal, mildly model-dependent, low computation cost approach to identify strongly lensed candidates. This technique efficiently narrows the search space, allowing for more sensitive, but (much) higher latency, algorithms to refine the results further. We demonstrate that our method performs significantly better than other computationally inexpensive methods. In particular, we achieve 97 percent (80 percent) lensed event detection at a pairwise false positive probability of approximately 13 percent (7 percent) for a single detector with LIGO design sensitivity, assuming an SNR greater than or equa
ROScopter is a lean multirotor autopilot built for researchers. ROScopter seeks to accelerate simulation and hardware testing of research code with an architecture that is both easy to understand and simple to modify. ROScopter is designed to interface with ROSflight 2.0 and runs entirely on an onboard flight computer, leveraging the features of ROS 2 to improve modularity. This work describes the architecture of ROScopter and how it can be used to test application code in both simulated and hardware environments. Hardware results of the default ROScopter behavior are presented, showing that ROScopter achieves similar performance to another state-of-the-art autopilot for basic waypoint-following maneuvers, but with a significantly reduced and more modular code-base.
Semi-supervised learning (SSL) aims to train a machine learning model using both labelled and unlabelled data. While the unlabelled data have been used in various ways to improve the prediction accuracy, the reason why unlabelled data could help is not fully understood. One interesting and promising direction is to understand SSL from a causal perspective. In light of the independent causal mechanisms principle, the unlabelled data can be helpful when the label causes the features but not vice versa. However, the causal relations between the features and labels can be complex in real world applications. In this paper, we propose a SSL framework that works with general causal models in which the variables have flexible causal relations. More specifically, we explore the causal graph structures and design corresponding causal generative models which can be learned with the help of unlabelled data. The learned causal generative model can generate synthetic labelled data for training a more accurate predictive model. We verify the effectiveness of our proposed method by empirical studies on both simulated and real data.
Large language models (LLMs) appear to bias their survey answers toward certain values. Nonetheless, some argue that LLMs are too inconsistent to simulate particular values. Are they? To answer, we first define value consistency as the similarity of answers across (1) paraphrases of one question, (2) related questions under one topic, (3) multiple-choice and open-ended use-cases of one question, and (4) multilingual translations of a question to English, Chinese, German, and Japanese. We apply these measures to small and large, open LLMs including llama-3, as well as gpt-4o, using 8,000 questions spanning more than 300 topics. Unlike prior work, we find that models are relatively consistent across paraphrases, use-cases, translations, and within a topic. Still, some inconsistencies remain. Models are more consistent on uncontroversial topics (e.g., in the U.S., "Thanksgiving") than on controversial ones ("euthanasia"). Base models are both more consistent compared to fine-tuned models and are uniform in their consistency across topics, while fine-tuned models are more inconsistent about some topics ("euthanasia") than others ("women's rights") like our human subjects (n=165).
Human memory retrieval often resembles ecological foraging where animals search for food in a patchy environment. Optimal foraging means following the Marginal Value Theorem (MVT), in which individuals exploit a patch of semantically related concepts until it becomes less rewarding and then switch to a new cluster. While human behavioral data suggests foraging-like patterns in semantic fluency tasks, it remains unclear whether modern high-dimensional embedding spaces provide representations that allow algorithms to match observed human behavior. Using state-of-the-art embeddings and prior semantic fluency data, I find that random walks on these embedding spaces produce results consistent with optimal foraging and the MVT. Surprisingly, introducing Metropolis-Hastings sampling, an adaptive algorithm expected to model strategic acceptance and rejection of new clusters, does not produce results consistent with human behavior. These findings challenge the assumption that more complex sampling mechanisms inherently lead to better cognitive models of memory retrieval. Instead, they show that appropriately structured embeddings, even with simple sampling, can produce near-optimal foraging
Super-Earths orbiting M-dwarf stars may be the most common habitable planets in the Universe. However, their habitability is threatened by intense irradiation from their host stars, which drives the escape of water to space and can lead to surface desiccation. We present simulation results of a box model of water cycling between interior and atmosphere and loss to space, for terrestrial planets of mass 1--8 $M_\oplus$ orbiting in the habitable zone of a late M-dwarf. Energy-limited loss decreases with planetary mass, while diffusion-limited loss increases with mass. Depending on where it orbits in the habitable zone, a 1 $M_\oplus$ planet that starts with 3--8 Earth Oceans can end up with an Earth-like surface of oceans and exposed continents; for an 8 $M_\oplus$ super-Earth, that range is 3--12 Earth Oceans. Planets initialized with more water end up as waterworlds with no exposed continents, while planets that start with less water have desiccated surfaces by 5 Gyr. Since the mantles of terrestrial planets can hold much more water than is currently present in Earth's atmosphere, none of our simulations result in Dune planets -- such planets may be less common than previously thou
The approximate deconvolution Leray reduced order model (ADL-ROM) uses spatial filtering to increase the ROM stability, and approximate deconvolution to increase the ROM accuracy. In the under-resolved numerical simulation of convection-dominated flows, ADL-ROM was shown to be significantly more stable than the standard ROM, and more accurate than the Leray ROM. In this paper, we prove a priori error bounds for the approximate deconvolution operator and ADL-ROM. To our knowledge, these are the first numerical analysis results for approximate deconvolution in a ROM context. We illustrate these numerical analysis results in the numerical simulation of convection-dominated flows.
We investigate the hypothesis that sexaquarks, hypothetical stable six-quark states, could be a significant component of the dark matter. We expand on previous studies of sexaquark cosmology, accounting for the possibility that some relevant interaction cross sections might be strongly suppressed below expectations based on dimensional analysis. We update direct-detection constraints on stable sexaquarks comprising a subdominant fraction of the dark matter, as well as limits on the annihilation of an antisexaquark component from Super-Kamiokande. We argue that the scenario where sexaquarks comprise a $O(1)$ fraction of the dark matter would require either a suppression of $O(10^{-19})$ in sexaquark interactions with baryons, combined with a very high yield of net sexaquark number from the quark-hadron transition, or else a very strong suppression of the cross section for antisexaquark annihilation on nucleons (24+ orders of magnitude below the QCD scale). Independently, we find that a sexaquark component comprising more than $O(10^{-3})$ of the dark matter can be excluded from direct-detection bounds, unless its scattering cross section is severely suppressed compared to the expect
Language models are known to absorb biases from their training data, leading to predictions driven by statistical regularities rather than semantic relevance. We investigate the impact of these biases on answer choice preferences in the Massive Multi-Task Language Understanding (MMLU) task. Our findings show that these biases are predictive of model preference and mirror human test-taking strategies even when chain of thought (CoT) reasoning is used. To address this issue, we introduce Counterfactual Prompting with Agnostically Primed CoT (APriCoT). We demonstrate that while Counterfactual Prompting with CoT alone is insufficient to mitigate bias, APriCoT effectively reduces the influence of base-rate probabilities while improving overall accuracy. Our results suggest that mitigating bias requires a slow thinking process which CoT alone may not provide as it tends to reinforce fast thinking model bias under some prompting methodologies. APriCoT is a step toward developing more robust and fair language models that can think slow.
We consider a light-front dressed quark state, per se, instead of a proton state, we consider a simple composite spin-1/2 state of a quark dressed with a gluon. This perturbative model incorporates gluonic degrees of freedom, which enable us to evaluate the gravitational form factors (GFFs) of the quark as well as the gluon in this model \cite{More:2021stk, More:2023pcy}. We employ the Hamiltonian framework and choose the light-front gauge $A^+=0$. We calculate the four GFFs and corroborate the sum rules that GFFs satisfy. The GFF $D$ is attributed to information like pressure, shear, and energy distributions. We analyze some of these distributions for a dressed quark state at one loop in QCD.
We will construct ``higher-dimensional" versions of the Wiener-Wintner dynamical system that was originally studied by I. Assani in 2003. We will show that on these systems we can provide very simple proofs of the a.e. convergence of the multiple recurrence averages, as well as the multiple recurrence return times averages. We will do so by obtaining a quantitative control of the multiple ergodic averages by extending the estimate for the double recurrence that was attained by J. Bourgain. We will also observe that this class of dynamical systems contains numerous examples that are not bounded by the standard classifications (e.g. entropy, mixing), such as Kolmogorov systems, classical skew products, as well as systems for which the a.e. convergence of multiple recurrence is not currently known. Along our way, we will also provide alternative characteristics of the Host-Kra-Ziegler factors from the point of view of the uniform Wiener-Wintner theorem.
Recently, it has been shown that one-dimensional quantum walks can mix more quickly than classical random walks, suggesting that quantum Monte Carlo algorithms can outperform their classical counterparts. We study two quantum walks on the n-dimensional hypercube, one in discrete time and one in continuous time. In both cases we show that the quantum walk mixes in (π/4)n steps, faster than the O(n log n) steps required by the classical walk. In the continuous-time case, the probability distribution is {\em exactly} uniform at this time. More importantly, these walks expose several subtleties in the definition of mixing time for quantum walks. Even though the continuous-time walk has an O(n) instantaneous mixing time at which it is precisely uniform, it never approaches the uniform distribution when the stopping time is chosen randomly as in [AharonovAKV2001]. Our analysis treats interference between terms of different phase more carefully than is necessary for the walk on the cycle; previous general bounds predict an exponential, rather than linear, mixing time for the hypercube.
Some claim language models understand us. Others won't hear it. To clarify, I investigate three views of human language understanding: as-mapping, as-reliability and as-representation. I argue that while behavioral reliability is necessary for understanding, internal representations are sufficient; they climb the right hill. I review state-of-the-art language and multi-modal models: they are pragmatically challenged by under-specification of form. I question the Scaling Paradigm: limits on resources may prohibit scaled-up models from approaching understanding. Last, I describe how as-representation advances a science of understanding. We need work which probes model internals, adds more of human language, and measures what models can learn.
Approximate algebraic structures play a defining role in arithmetic combinatorics and have found remarkable applications to basic questions in number theory and pseudorandomness. Here we study approximate representations of finite groups: functions f:G -> U_d such that Pr[f(xy) = f(x) f(y)] is large, or more generally Exp_{x,y} ||f(xy) - f(x)f(y)||^2$ is small, where x and y are uniformly random elements of the group G and U_d denotes the unitary group of degree d. We bound these quantities in terms of the ratio d / d_min where d_min is the dimension of the smallest nontrivial representation of G. As an application, we bound the extent to which a function f : G -> H can be an approximate homomorphism where H is another finite group. We show that if H's representations are significantly smaller than G's, no such f can be much more homomorphic than a random function. We interpret these results as showing that if G is quasirandom, that is, if d_min is large, then G cannot be embedded in a small number of dimensions, or in a less-quasirandom group, without significant distortion of G's multiplicative structure. We also prove that our bounds are tight by showing that minors of gen
We have simulated disk galaxies undergoing continual bombardment by other galaxies in a rich cluster. "Galaxy harassment" leads to dramatic evolution of smaller disk galaxies and provides an extremely effective mechanism to fuel a central quasar. Within a few billion years after a small disk galaxy enters the cluster environment, up to 90% of its gas can be driven into the inner 500 pc. Up to half of the mass can be transferred in a burst lasting just 100-200 Myr. This transport of gas to the center of galaxy is far more efficient than any mechanism proposed before. Galaxy harassment was first proposed to explain the disturbed blue galaxies in clusters seen in clusters at ($z \gsim 0.3$), the "Butcher-Oemler effect". Quasars at the same reshifts lie in more clustered environments than those at lower redshift. Recent HST observations find that roughly half of all observed quasar host galaxiess are fainter than ł*, with many of these less luminous hosts occuring at redshifts $z \gsim 0.3$. We examine 5 quasars that are claimed to have low luminosity hosts and find that 3 are in rich clusters of galaxies, the fourth may be in a cluster but the evidence for this is marginal. The enviro
We examine the effects of mass resolution and force softening on the density profiles of cold dark matter halos that form within cosmological N-body simulations. As we increase the mass and force resolution, we resolve progenitor halos that collapse at higher redshifts and have very high densities. At our highest resolution we have nearly 3 million particles within the virial radius, several orders of magnitude more than previously used and we can resolve more than one thousand surviving dark matter halos within this single virialised system. The halo profiles become steeper in the central regions and we may not have achieved convergence to a unique slope within the inner 10% of the virialised region. Results from two very high resolution halo simulations yield steep inner density profiles, $ρ(r)\sim r^{-1.4}$. The abundance and properties of arcs formed within this potential will be different from calculations based on lower resolution simulations. The kinematics of disks within such a steep potential may prove problematic for the CDM model when compared with the observed properties of halos on galactic scales.
We study proper lattice animals for bond- and site-percolation on the hypercubic lattice $\mathbb{Z}^d$ to derive asymptotic series of the percolation threshold $p_c$ in $1/d$, The first few terms of these series were computed in the 1970s, but the series have not been extended since then. We add two more terms to the series for $\pcsite$ and one more term to the series for $\pcbond$, using a combination of brute-force enumeration, combinatorial identities and an approach based on Padé approximants, which requires much fewer resources than the classical method. We discuss why it took 40 years to compute these terms, and what it would take to compute the next ones. En passant, we present new perimeter polynomials for site and bond percolation and numerical values for the growth rate of bond animals.
We present a technique for constructing equilibrium triaxial N-body haloes with nearly arbitrary density profiles, axial ratios and spin parameters. The method is based on the way in which structures form in hierarchical cosmological simulations, where prolate and oblate haloes form via mergers with low and high angular momentum, respectively. We show that major mergers between equilibrium spherical cuspy haloes produce similarly cuspy triaxial remnants and higher angular-momentum mergers produce systems with lower concentrations. Triaxial haloes orbiting within deeper potentials become more spherical and their velocity dispersion tensors more isotropic. The rate of mass loss depends sensitively on the halo shape: a prolate halo can lose mass at a rate several times higher than an isotropic spherical halo with the same density profile. Subhaloes within cosmological simulations are significantly rounder than field haloes with axial ratios that are ~ 30% larger.