共找到 20 条结果
As autonomous agents powered by LLM are increasingly deployed in society, understanding their collective behaviour in social dilemmas becomes critical. We introduce an evaluation framework where LLMs generate strategies encoded as algorithms, enabling inspection prior to deployment and scaling to populations of hundreds of agents -- substantially larger than in previous work. We find that more recent models tend to produce worse societal outcomes compared to older models when agents prioritise individual gain over collective benefits. Using cultural evolution to model user selection of agents, our simulations reveal a significant risk of convergence to poor societal equilibria, particularly when the relative benefit of cooperation diminishes and population sizes increase. We release our code as an evaluation suite for developers to assess the emergent collective behaviour of their models.
The radius of a planet is a fundamental parameter that probes its composition and habitability. Precise radius measurements are typically derived from the fraction of starlight blocked when a planet transits its host star. The wide-field Transiting Exoplanet Survey Satellite (TESS) has discovered hundreds of new exoplanets, but its low angular resolution means that the light from a star hosting a transiting exoplanet can be blended with the light from background stars. If not fully corrected, this extra light can dilute the transit signal and result in a smaller measured planet radius. In a study of hundreds of TESS planet discoveries using deblended light curves from our validated methodology, we show that systematically incorrect planet radii are common in the literature: studies using various public TESS photometry pipelines have underestimated the planet radius by a weighted median of $6.1\% \pm 0.3\%$, leading to a $\sim20\%$ overestimation of planet density. The widespread presence of these biases in the literature has profoundly shaped-and potentially misrepresented-our understanding of the exoplanet population. Addressing these biases will refine the exoplanet mass-radius r
Modern image generators produce strikingly realistic images, where only artifacts like distorted hands or warped objects reveal their synthetic origin. Detecting these artifacts is essential: without detection, we cannot benchmark generators or train reward models to improve them. Current detectors fine-tune VLMs on tens of thousands of labeled images, but this is expensive to repeat whenever generators evolve or new artifact types emerge. We show that pretrained VLMs already encode the knowledge needed to detect artifacts - with the right scaffolding, this capability can be unlocked using only a few hundred labeled examples per artifact category. Our system, ArtifactLens, achieves state-of-the-art on five human artifact benchmarks (the first evaluation across multiple datasets) while requiring orders of magnitude less labeled data. The scaffolding consists of a multi-component architecture with in-context learning and text instruction optimization, with novel improvements to each. Our methods generalize to other artifact types - object morphology, animal anatomy, and entity interactions - and to the distinct task of AIGC detection.
We present VisionLLM v2, an end-to-end generalist multimodal large model (MLLM) that unifies visual perception, understanding, and generation within a single framework. Unlike traditional MLLMs limited to text output, VisionLLM v2 significantly broadens its application scope. It excels not only in conventional visual question answering (VQA) but also in open-ended, cross-domain vision tasks such as object localization, pose estimation, and image generation and editing. To this end, we propose a new information transmission mechanism termed "super link", as a medium to connect MLLM with task-specific decoders. It not only allows flexible transmission of task information and gradient feedback between the MLLM and multiple downstream decoders but also effectively resolves training conflicts in multi-tasking scenarios. In addition, to support the diverse range of tasks, we carefully collected and combed training data from hundreds of public vision and vision-language tasks. In this way, our model can be joint-trained end-to-end on hundreds of vision language tasks and generalize to these tasks using a set of shared parameters through different user prompts, achieving performance compar
Modern large-scale recommendation systems rely heavily on user interaction history sequences to enhance the model performance. The advent of large language models and sequential modeling techniques, particularly transformer-like architectures, has led to significant advancements recently (e.g., HSTU, SIM, and TWIN models). While scaling to ultra-long user histories (10k to 100k items) generally improves model performance, it also creates significant challenges on latency, queries per second (QPS) and GPU cost in industry-scale recommendation systems. Existing models do not adequately address these industrial scalability issues. In this paper, we propose a novel two-stage modeling framework, namely VIrtual Sequential Target Attention (VISTA), which decomposes traditional target attention from a candidate item to user history items into two distinct stages: (1) user history summarization into a few hundred tokens; followed by (2) candidate item attention to those tokens. These summarization token embeddings are then cached in storage system and then utilized as sequence features for downstream model training and inference. This novel design for scalability enables VISTA to scale to l
Obelia improves upon structured DAG-based consensus protocols used in proof-of-stake systems, allowing them to effectively scale to accommodate hundreds of validators. Obelia implements a two-tier validator system. A core group of high-stake validators that propose blocks as in current protocols and a larger group of lower-stake auxiliary validators that occasionally author blocks. Obelia incentivizes auxiliary validators to assist recovering core validators and integrates seamlessly with existing protocols. We show that Obelia does not introduce visible overhead compared to the original protocol, even when scaling to hundreds of validators, or when a large number of auxiliary validators are unreliable.
Recent advances in neural recording technology allow simultaneously recording action potentials from hundreds to thousands of neurons in awake, behaving animals. However, characterizing spike patterns in the resulting data, and linking these patterns to behaviour, remains a challenging task. The lack of a rigorous mathematical language for variable numbers of events (spikes) emitted by multiple agents (neurons) is an important limiting factor. We introduce a new mathematical operation to decompose complex spike patterns into a set of simple, structured elements. This creates a mathematical language that allows comparing spike patterns across trials, detecting sub-patterns, and making links to behaviour via a clear distance measure. We apply the method to dual Utah array recordings from macaque prefrontal cortex, where this technique reveals previously unseen structure that can predict both memory-guided decisions and errors in a virtual-reality working memory task. These results demonstrate that this technique provides a powerful new approach to understand structure in the spike times of neural populations, at a scale that will continue to grow more and more rapidly in upcoming yea
This letter introduces a structured high-rank tensor approach for estimating sub-6G uplink channels in multi-user multiple-input and multiple-output (MU-MIMO) systems. To tackle the difficulty of channel estimation in sub-6G bands with hundreds of sub-paths, our approach fully exploits the physical structure of channel and establishes the link between sub-6G channel model and a high-rank four-dimensional (4D) tensor Canonical Polyadic Decomposition (CPD) with three factor matrices being Vandermonde-constrained. Accordingly, a stronger uniqueness property is derived in this work. This model supports an efficient one-pass algorithm for estimating sub-path parameters, which ensures plug-in compatibility with the widely-used baseline. Our method performs much better than the state-of-the-art tensor-based techniques on the simulations adhering to the 3GPP 5G protocols.
In the span of four decades, quantum computation has evolved from an intellectual curiosity to a potentially realizable technology. Today, small-scale demonstrations have become possible for quantum algorithmic primitives on hundreds of physical qubits. Nevertheless, there are significant outstanding challenges in quantum hardware, fabrication, software architecture, and algorithms on the path towards a full-stack scalable quantum computing technology. Here, we provide a comprehensive review of these scaling challenges. We show how to facilitate scaling by adopting existing semiconductor technology to build much higher-quality qubits, employing systems engineering approaches, and performing distributed heterogeneous quantum-classical computing. We provide a detailed resource and sensitivity analysis for quantum applications on surface-code error-corrected quantum computers given current, target, and desired hardware specifications based on superconducting qubits, accounting for a realistic distribution of errors. We provide comprehensive resource estimates for several utility-scale applications including quantum chemistry calculations, catalyst design, NMR spectroscopy, and Fermi-H
We explore strategies aimed at reducing the amount of computation, both quantum and classical, required to run the Quantum Approximate Optimization Algorithm (QAOA). First, following Wurtz et al. [Phys.Rev A 104:052419], we consider the standard QAOA with instance-independent "tree" parameters chosen in advance. These tree parameters are chosen to optimize the MaxCut expectation for large girth graphs. We provide extensive numerical evidence supporting the performance guarantee for tree parameters conjectured in [Phys.Rev A 103:042612] and see that the approximation ratios obtained with tree parameters are typically well beyond the conjectured lower bounds, often comparable to performing a full optimization. This suggests that in practice, the QAOA can achieve near-optimal performance without the need for parameter optimization. Next, we modify the warm-start QAOA of Tate et al. [Quantum 7:1121]. The starting state for the QAOA is now an optimized product state associated with a solution of the Goemans-Williamson (GW) algorithm. Surprisingly, the tree parameters continue to perform well for the warm-start QAOA. We find that for random 3-regular graphs with hundreds of vertices, the
Mars' tadpole craters are small, young craters whose crater rims are incised by one or more exit breaches but lack visible inlets. The tadpole forming climate records the poorly understood drying of Mars since the Early Hesperian. A third of tadpole craters have multiple breaches, therefore a process is needed that was able to generate crater rim incision in multiple locations. We use HiRISE data for four multiple breach tadpole craters to measure their crater fill, rims, and exit breaches. We compare these measurements and other data to our calculations of liquid water supply by rain, surface melting, groundwater discharge, and basal ice sheet melting to discriminate between four proposed formation hypotheses for tadpole breaches, favoring scenarios with ice-filled craters and supraglacial melting. We conclude that multiple breach tadpole craters record hundreds of meters of mid-latitude ice and climate conditions enabling intermittent melting in the Late Hesperian and Amazonian, suggesting that liquid water on Mars has been available in association with water ice for billions of years.
The electron velocity distribution function in the plasma, formed by gas ionization with a sub-nanosecond, hundreds of megawatts power level microwave pulse, is studied by a theoretical model and by numerical 3D simulations, the results of which agree well and show that the distribution varies along the pulse as a decreasing power-law function at the rear of the pulse. Experiments performed in a waveguide filled with helium gas confirm that energetic (from several keV to several tens of keV) electrons remain in plasma long after the pulse has crossed the experimental volume. These electrons continue the gas ionization over extended times up to tens of nanoseconds.
Image-based multi-person reconstruction in wide-field large scenes is critical for crowd analysis and security alert. However, existing methods cannot deal with large scenes containing hundreds of people, which encounter the challenges of large number of people, large variations in human scale, and complex spatial distribution. In this paper, we propose Crowd3D, the first framework to reconstruct the 3D poses, shapes and locations of hundreds of people with global consistency from a single large-scene image. The core of our approach is to convert the problem of complex crowd localization into pixel localization with the help of our newly defined concept, Human-scene Virtual Interaction Point (HVIP). To reconstruct the crowd with global consistency, we propose a progressive reconstruction network based on HVIP by pre-estimating a scene-level camera and a ground plane. To deal with a large number of persons and various human sizes, we also design an adaptive human-centric cropping scheme. Besides, we contribute a benchmark dataset, LargeCrowd, for crowd reconstruction in a large scene. Experimental results demonstrate the effectiveness of the proposed method. The code and datasets wi
Remote photoplethysmography (rPPG) emerges as a promising method for non-invasive, convenient measurement of vital signs, utilizing the widespread presence of cameras. Despite advancements, existing datasets fall short in terms of size and diversity, limiting comprehensive evaluation under diverse conditions. This paper presents an in-depth analysis of the VitalVideo dataset, the largest real-world rPPG dataset to date, encompassing 893 subjects and 6 Fitzpatrick skin tones. Our experimentation with six unsupervised methods and three supervised models demonstrates that datasets comprising a few hundred subjects(i.e., 300 for UBFC-rPPG, 500 for PURE, and 700 for MMPD-Simple) are sufficient for effective rPPG model training. Our findings highlight the importance of diversity and consistency in skin tones for precise performance evaluation across different datasets.
The dominant paradigm in 3D human pose estimation that lifts a 2D pose sequence to 3D heavily relies on long-term temporal clues (i.e., using a daunting number of video frames) for improved accuracy, which incurs performance saturation, intractable computation and the non-causal problem. This can be attributed to their inherent inability to perceive spatial context as plain 2D joint coordinates carry no visual cues. To address this issue, we propose a straightforward yet powerful solution: leveraging the readily available intermediate visual representations produced by off-the-shelf (pre-trained) 2D pose detectors -- no finetuning on the 3D task is even needed. The key observation is that, while the pose detector learns to localize 2D joints, such representations (e.g., feature maps) implicitly encode the joint-centric spatial context thanks to the regional operations in backbone networks. We design a simple baseline named Context-Aware PoseFormer to showcase its effectiveness. Without access to any temporal information, the proposed method significantly outperforms its context-agnostic counterpart, PoseFormer, and other state-of-the-art methods using up to hundreds of video frames
Photon-number resolving detectors with hundreds of pixels are now readily available, while the characterization of these detectors using detector tomography is computationally intensive. Here, we present a modified detector tomography model that reduces the number of variables that need optimization. To evaluate the effectiveness and accuracy of our model, we reconstruct the photon number distribution of optical coherent and thermal states using the expectation-maximization-entropy algorithm. Our results indicate that the fidelity of the reconstructed states remains above 99%, and the second and third-order correlations agree well with the theoretical values for a mean number of photons up to 100. We also investigate the computational resources required for detector tomography and find out that our approach reduces the solving time by around a half compared to the standard detector tomography approach, and the required memory resources are the main obstacle for detector tomography of a large number of pixels. Our results suggest that detector tomography is viable on a supercomputer with 1~TB RAM for detectors with up to 340 pixels.
In recent years, pre-trained language models have undergone rapid development with the emergence of large-scale models. However, there is a lack of open-sourced chat models specifically designed for the Chinese language, especially in the field of Chinese finance, at the scale of hundreds of billions. To address this gap, we introduce XuanYuan 2.0, the largest Chinese chat model to date, built upon the BLOOM-176B architecture. Additionally, we propose a novel training method called hybrid-tuning to mitigate catastrophic forgetting. By combining general-domain with domain-specific knowledge and integrating the stages of pre-training and fine-tuning, XuanYuan 2.0 is capable of providing accurate and contextually appropriate responses in the Chinese financial domain.
A large qubit capacity and an individual readout capability are two crucial requirements for large-scale quantum computing and simulation. As one of the leading physical platforms for quantum information processing, the ion trap has achieved quantum simulation of tens of ions with site-resolved readout in 1D Paul trap, and that of hundreds of ions with global observables in 2D Penning trap. However, integrating these two features into a single system is still very challenging. Here we report the stable trapping of 512 ions in a 2D Wigner crystal and the sideband cooling of their transverse motion. We demonstrate the quantum simulation of long-range quantum Ising models with tunable coupling strengths and patterns, with or without frustration, using 300 ions. Enabled by the site resolution in the single-shot measurement, we observe rich spatial correlation patterns in the quasi-adiabatically prepared ground states, which allows us to verify quantum simulation results by comparing with the calculated collective phonon modes and with classical simulated annealing. We further probe the quench dynamics of the Ising model in a transverse field to demonstrate quantum sampling tasks. Our w
As part of the GRAVITY$^{+}$ project, the near-infrared beam combiner GRAVITY and the VLTI are currently undergoing a series of significant upgrades to further improve the performance and sky coverage. The instrumental changes will be transformational, and for instance uniquely position GRAVITY to observe the broad line region of hundreds of Active Galactic Nuclei (AGN) at a redshift of two and higher. The increased sky coverage is achieved by enlarging the maximum angular separation between the celestial science object (SC) and the off-axis fringe tracking (FT) star from currently 2 arcseconds (arcsec) up to unprecedented 30 arcsec, limited by the atmospheric conditions. This was successfully demonstrated at the VLTI for the first time.
Constraining the distribution of small-scale structure in our universe allows us to probe alternatives to the cold dark matter paradigm. Strong gravitational lensing offers a unique window into small dark matter halos ($<10^{10} M_\odot$) because these halos impart a gravitational lensing signal even if they do not host luminous galaxies. We create large datasets of strong lensing images with realistic low-mass halos, Hubble Space Telescope (HST) observational effects, and galaxy light from HST's COSMOS field. Using a simulation-based inference pipeline, we train a neural posterior estimator of the subhalo mass function (SHMF) and place constraints on populations of lenses generated using a separate set of galaxy sources. We find that by combining our network with a hierarchical inference framework, we can both reliably infer the SHMF across a variety of configurations and scale efficiently to populations with hundreds of lenses. By conducting precise inference on large and complex simulated datasets, our method lays a foundation for extracting dark matter constraints from the next generation of wide-field optical imaging surveys.