We present Manticore-Deep, a high-resolution Bayesian field-level inference of cosmic large-scale structure spanning a comoving volume of $(4~h^{-1}\mathrm{Gpc})^{3}$ out to $z \approx 0.7$, at ${\sim}4$~Mpc/h resolution. Building on the inference framework established in the companion Manticore-Local analysis (P1), Manticore-Deep jointly constrains five galaxy redshift surveys (2M++, 6dFGS, 2dFGRS, SDSS, and BOSS) within a single hierarchical Bayesian framework using the BORG algorithm. The method infers initial conditions that are evolved forward under gravitational dynamics, delivering a full posterior ensemble of three-dimensional density and velocity fields that causally reproduce the observed large-scale structure. A novel tiled inference strategy makes this computation feasible, extending the reconstructed volume by more than an order of magnitude beyond P1. The posterior realisations are statistically consistent with LCDM, exhibiting Gaussian, isotropic initial conditions and evolving into late-time structures that reproduce the expected $z=0$ matter power spectrum, bispectrum, and halo mass function across the resolved scales tested. We validate the physical fidelity of th
We present the first results from the Manticore project, dubbed Manticore-Local, a suite of Bayesian constrained simulations of the nearby Universe, generated by fitting a physical structure formation model to the 2M++ galaxy catalogue using the BORG algorithm. This field-level inference yields physically consistent realizations of cosmic structure, leveraging a nonlinear gravitational solver, a refined galaxy bias model, and physics-informed priors. The Manticore-Local posterior realizations evolve within a parent cosmological volume statistically consistent with LCDM, demonstrated through extensive posterior predictive tests of power spectra, bispectra, initial condition Gaussianity, and the halo mass function. The inferred local supervolume shows no significant deviation from cosmological expectations; notably, we find no evidence for a large local underdensity. Our model identifies high-significance counterparts for fourteen prominent galaxy clusters each within one degree of its observed sky position. Across the posterior ensemble, these counterparts are consistently detected with 2-4 sigma significance, and their reconstructed masses and redshifts agree closely with observati
We present a publicly available catalogue of massive structures in the nearby Universe, constructed from the Manticore-Local posterior ensemble, a Bayesian field-level reconstruction that infers the underlying dark matter distribution from 2M++ galaxies. We identify massive structures by clustering the central haloes inferred at z = 0 across the 80 posterior realizations, selecting at most one member per realization. These associations serve as probabilistic counterparts to individual massive clusters, each with robust posterior estimates of mass, position, and velocity. The fiducial catalogue contains 401 associations with ambiguity rates below 5%, and at least 20 member haloes across the posterior ensemble. We independently validate these systems through stacked \textit{Planck} thermal SZ measurements, which yield significant detections showing the expected mass trend, consistent with the established Y--M scaling relation. Many associations exhibit coherent evolutionary histories, meaning that their progenitor haloes across the posterior ensemble trace consistent merger pathways rather than diverging into unrelated assembly scenarios. Even with only z = 0 constraints, the inferen
The demise of Moore's Law and Dennard Scaling has revived interest in specialized computer architectures and accelerators. Verification and testing of this hardware depend heavily upon cycle-accurate simulation of register-transfer-level (RTL) designs. The fastest software RTL simulators can simulate designs at 1--1000 kHz, i.e., more than three orders of magnitude slower than hardware. Improved simulators can increase designers' productivity by speeding design iterations and permitting more exhaustive exploration. One possibility is to exploit low-level parallelism, as RTL expresses considerable fine-grain concurrency. Unfortunately, state-of-the-art RTL simulators often perform best on a single core since modern processors cannot effectively exploit fine-grain parallelism. This work presents Manticore: a parallel computer designed to accelerate RTL simulation. Manticore uses a static bulk-synchronous parallel (BSP) execution model to eliminate fine-grain synchronization overhead. It relies entirely on a compiler to schedule resources and communication, which is feasible since RTL code contains few divergent execution paths. With static scheduling, communication and synchronizatio
An effective way to maximize code coverage in software tests is through dynamic symbolic execution$-$a technique that uses constraint solving to systematically explore a program's state space. We introduce an open-source dynamic symbolic execution framework called Manticore for analyzing binaries and Ethereum smart contracts. Manticore's flexible architecture allows it to support both traditional and exotic execution environments, and its API allows users to customize their analysis. Here, we discuss Manticore's architecture and demonstrate the capabilities we have used to find bugs and verify the correctness of code for our commercial clients.
Data-parallel problems demand ever growing floating-point (FP) operations per second under tight area- and energy-efficiency constraints. In this work, we present Manticore, a general-purpose, ultra-efficient chiplet-based architecture for data-parallel FP workloads. We have manufactured a prototype of the chiplet's computational core in Globalfoundries 22FDX process and demonstrate more than 5x improvement in energy efficiency on FP intensive workloads compared to CPUs and GPUs. The compute capability at high energy and area efficiency is provided by Snitch clusters containing eight small integer cores, each controlling a large FPU. The core supports two custom ISA extensions: The SSR extension elides explicit load and store instructions by encoding them as register reads and writes. The FREP extension decouples the integer core from the FPU allowing floating-point instructions to be issued independently. These two extensions allow the single-issue core to minimize its instruction fetch bandwidth and saturate the instruction bandwidth of the FPU, achieving FPU utilization above 90%, with more than 40% of core area dedicated to the FPU.
This document presents implementations of fundamental convolutional neural network (CNN) layers on the Manticore cluster-based many-core architecture and discusses their characteristics and trade-offs.
We apply caustic skeleton theory to the Manticore-Local simulations, which are Bayesian constrained reconstructions of the Local Universe from the 2M++ galaxy catalogue, and extract the three-dimensional multi-scale caustic skeleton of two canonical weblike structures in our Local Universe, namely the Coma Cluster and the Pisces-Perseus ridge as they represent the most prominent cluster node and filamentary artery in the nearby Universe. We show that the Caustic Skeleton network of caustic singularities accurately reproduces the observed large-scale organisation of galaxies in redshift space for one of the Manticore realisations. The hierarchy of caustic features allows us to establish a multi-scale classification of the large-scale environment in which observed 2M++ galaxies reside. One of the most interesting aspects of the theory is that it predicts two topologically distinct classes of filaments (A_4 swallowtail and D_4 umbilic caustics) that form through fundamentally different folding histories yet appear morphologically similar enough, on the surface, to be overlooked by conventional structure identifiers. We find that the influence of D_4 filaments only becomes increasingly
We present a suite of 50 high-fidelity simulations of Coma cluster analogues constructed from BORG/MANTICORE constrained initial conditions and evolved with the IllustrisTNG galaxy formation model. Regions predicted to form massive clusters comparable to Coma in mass and environment are selected and followed through cosmic time, producing realistic galaxy populations and intracluster medium properties. The ensemble captures both cosmic variance and uncertainties in the local initial conditions, providing a statistically robust framework for interpreting Coma in a cosmological context. We focus on direct comparisons with observed thermodynamical profiles of the intracluster medium. Specifically, we extract X-ray surface brightness profiles from the simulated clusters and confront them with measurements from eROSITA, as well as compute the thermal Sunyaev--Zel'dovich effect via integrated Compton-$y$ profiles for comparison with Planck satellite data. The simulations reproduce the broad shape and normalisation of both observables, while also highlighting the range of scatter expected from environmental and assembly history differences. This enables us to assess how feedback processes
We revisit the Great Attractor using the Manticore-Local suite of digital twins of the nearby Universe. The Great Attractor concept has been proposed as an answer to three distinct questions: what sources the Local Group velocity in the cosmic microwave background frame, where present-day velocity streamlines converge, and where the Local Group is moving to. Addressing the original motivation of the Great Attractor -- explaining the Local Group cosmic velocity -- we find that mass within $155~h^{-1}\mathrm{Mpc}$ accounts for only ${\sim}72\%$ of that velocity magnitude with ${\sim}38\,°$ directional offset. We show that even in the purely linear regime convergence within this volume is not guaranteed, particularly when also accounting for small-scale contributions to the observer velocity; no single structure, including the proposed Great Attractor, would be expected to dominate the velocity budget. Streamline convergence is smoothing-scale-dependent, transitioning from Virgo at small scales through the Hydra--Centaurus region at intermediate scales to Shapley at large scales; at intermediate smoothing the convergence point lies near Abell 3565 with an asymmetric basin of mass $\lo
The thermal Sunyaev-Zel'dovich (tSZ) effect provides a powerful probe of the thermal pressure of ionised gas in galaxy clusters and the cosmic web; constrained simulations reconstruct the mass and velocity fields of the local Universe. We explore how these two may be mutually informative: the tSZ signal provides a benchmark for assessing the fidelity of constrained simulations, and constrained simulations contribute information on the positions, total masses and density profiles of cosmic web structures for use in tSZ studies. We focus on cluster predictions in the Bayesian Origin Reconstruction from Galaxies (BORG) paradigm, introducing CSiBORG-Manticore, a new state-of-the-art suite of digital twins -- data-constrained posterior simulations whose initial conditions are inferred via Bayesian forward modelling. We develop a framework for scoring constrained simulations on their ability to match measured Planck Compton-$y$ maps around clusters, and use it to demonstrate improvement from previous BORG reconstructions. We further validate halo masses against weak-lensing-calibrated X-ray masses from eROSITA. We also show how high-fidelity digital twins offer a practical route to extra
Within the volume-limited subsample at $z<0.06$ of the Zwicky Transient Facility (ZTF) DR2 sample, we confirm a statistically significant excess of Type Ia supernovae (SNe Ia) at $z \simeq 0.02$-$0.04$, previously reported but not explained by survey selection effects. Forward simulations assuming a uniform volumetric SN Ia rate and realistic ZTF detection efficiencies fail to reproduce the feature at the $5$-$7σ$ level. We also detect an excess in the rates compared to our survey simulations at $z \simeq 0.08$ and $0.14$, albeit at smaller significance. To investigate the origin of these inhomogeneities, we compare the observed SN distribution to constrained reconstructions of the local matter density field from the Manticore project, based on Bayesian forward modelling of the 2M++ galaxy catalogue. While SN overdensities are spatially associated with prominent nearby structures such as the Perseus, Coma, and Hercules superclusters, the amplitude of the SN excesses significantly exceeds that expected from matter overdensities alone. By reconstructing a redshift-dependent volumetric SN Ia rate, we find that local enhancements can reach factors of two to five within specific clus
While cosmic voids are now recognized as a valuable cosmological probe, identifying them in a galaxy catalog is challenging for multiple reasons: observational effects such as holes in the mask or magnitude selection hinder the detection process; galaxies are biased tracers of the underlying dark matter distribution; and it is non-trivial to estimate the detection significance and parameter uncertainties for individual voids. Our goal is to extract a catalog of voids from constrained simulations of the large-scale structure that are consistent with the observed galaxy positions, effectively representing statistically independent realizations of the probability distribution of the cosmic web. This allows us to carry out a full Bayesian analysis of the structures emerging in the Universe. We use 50 posterior realizations of the large-scale structure in the Manticore-Local suite, obtained from the 2M++ galaxies. Running the VIDE void finder on each realization, we extract 50 independent void catalogs. We perform a posterior clustering analysis to identify high-significance voids at the 5$σ$ level, and we assess the probability distribution of their properties. We produce a catalog of
One of the most pressing problems in current cosmology is the cause of the Hubble tension. We revisit a two-rung distance ladder, composed only of Cepheid periods and magnitudes, anchor distances in the Milky Way, Large Magellanic Cloud, NGC 4258, and host galaxy redshifts. We adopt the SH0ES data for the most up-to-date and carefully vetted measurements, where the Cepheid hosts were selected to harbour also Type Ia supernovae. We introduce two important improvements: a rigorous selection modelling and a state-of-the-art density and peculiar velocity model using Manticore-Local, based on the Bayesian Origin Reconstruction from Galaxies (BORG) algorithm. We infer $H_0 = 71.1 \pm 1.4~\mathrm{km}\,\mathrm{s}^{-1}\,\mathrm{Mpc}^{-1}$, assuming the Cepheid host sample was selected by supernova magnitudes. However, the actual selection criteria are not clear, and other assumptions can increase $H_0$ by up to one statistical standard deviation. The posterior has a lower central value and a 41 per cent smaller uncertainty than a previous study using the same distance-ladder data. This result is lower than the supernova-based SH0ES inferred value of $H_0 = 73.2 \pm 0.9~\mathrm{km}\,\mathrm{
While Transformers underpin modern large language models (LMs), there is a growing list of alternative architectures with new capabilities, promises, and tradeoffs. This makes choosing the right LM architecture challenging. Recently proposed hybrid architectures seek a best-of-all-worlds approach that reaps the benefits of all architectures. Hybrid design is difficult for two reasons: it requires manual expert-driven search, and new hybrids must be trained from scratch. We propose Manticore, a framework that addresses these challenges by automating the design of hybrid architectures while reusing pretrained models to create pretrained hybrids. Our approach augments ideas from differentiable Neural Architecture Search (NAS) by incorporating simple projectors that translate features between pretrained blocks from different architectures. We then fine-tune hybrids that combine pretrained models from different architecture families -- such as the GPT series and Mamba -- end-to-end. With Manticore, we enable LM selection without training multiple models, the construction of pretrained hybrids from existing pretrained models, and the ability to program pretrained hybrids to have certain
LLVM is an infrastructure for code generation and low-level optimizations, which has been gaining popularity as a backend for both research and industrial compilers, including many compilers for functional languages. While LLVM provides a relatively easy path to high-quality native code, its design is based on a traditional runtime model which is not well suited to alternative compilation strategies used in high-level language compilers, such as the use of heap-allocated continuation closures. This paper describes a new LLVM-based backend that supports heap-allocated continuation closures, which enables constant-time callcc and very-lightweight multithreading. The backend has been implemented in the Parallel ML compiler, which is part of the Manticore system, but the results should be useful for other compilers, such as Standard ML of New Jersey, that use heap-allocated continuation closures.
Security attacks targeting smart contracts have been on the rise, which have led to financial loss and erosion of trust. Therefore, it is important to enable developers to discover security vulnerabilities in smart contracts before deployment. A number of static analysis tools have been developed for finding security bugs in smart contracts. However, despite the numerous bug-finding tools, there is no systematic approach to evaluate the proposed tools and gauge their effectiveness. This paper proposes SolidiFI, an automated and systematic approach for evaluating smart contract static analysis tools. SolidiFI is based on injecting bugs (i.e., code defects) into all potential locations in a smart contract to introduce targeted security vulnerabilities. SolidiFI then checks the generated buggy contract using the static analysis tools, and identifies the bugs that the tools are unable to detect (false-negatives) along with identifying the bugs reported as false-positives. SolidiFI is used to evaluate six widely-used static analysis tools, namely, Oyente, Securify, Mythril, SmartCheck, Manticore and Slither, using a set of 50 contracts injected by 9369 distinct bugs. It finds several in
Modern high-end machines feature multiple processor packages, each of which contains multiple independent cores and integrated memory controllers connected directly to dedicated physical RAM. These packages are connected via a shared bus, creating a system with a heterogeneous memory hierarchy. Since this shared bus has less bandwidth than the sum of the links to memory, aggregate memory bandwidth is higher when parallel threads all access memory local to their processor package than when they access memory attached to a remote package. This bandwidth limitation has traditionally limited the scalability of modern functional language implementations, which seldom scale well past 8 cores, even on small benchmarks. This work presents a garbage collector integrated with our strict, parallel functional language implementation, Manticore, and shows that it scales effectively on both a 48-core AMD Opteron machine and a 32-core Intel Xeon machine.