Automated environment configuration is a critical bottleneck in scaling software engineering (SWE) automation. To provide a reliable evaluation standard for this task, we present Multi-Docker-Eval benchmark. It includes 40 real-world repositories spanning 9 programming languages and measures both success in achieving executable states and efficiency under realistic constraints. Our extensive evaluation of state-of-the-art LLMs and agent frameworks reveals key insights: (1) the overall success rate of current models is low (F2P at most 37.7%), with environment construction being the primary bottleneck; (2) model size and reasoning length are not decisive factors, and open-source models like DeepSeek-V3.1 and Kimi-K2 are competitive in both efficiency and effectiveness; (3) agent framework and programming language also have significantly influence on success rate. These findings provide actionable guidelines for building scalable, fully automated SWE pipelines.
Construction sites frequently require removing large rocks before excavation or grading can proceed. Human operators typically extract these boulders using only standard digging buckets, avoiding time-consuming tool changes to specialized grippers. This task demands manipulating irregular objects with unknown geometries in harsh outdoor environments where dust, variable lighting, and occlusions hinder perception. The excavator must adapt to varying soil resistance--dragging along hard-packed surfaces or penetrating soft ground--while coordinating multiple hydraulic joints to secure rocks using a shovel. Current autonomous excavation focuses on continuous media (soil, gravel) or uses specialized grippers with detailed geometric planning for discrete objects. These approaches either cannot handle large irregular rocks or require impractical tool changes that interrupt workflow. We train a reinforcement learning policy in simulation using rigid-body dynamics and analytical soil models. The policy processes sparse LiDAR points (just 20 per rock) from vision-based segmentation and proprioceptive feedback to control standard excavator buckets. The learned agent discovers different strate
Paired image-text data with subtle variations in-between (e.g., people holding surfboards vs. people holding shovels) hold the promise of producing Vision-Language Models with proper compositional understanding. Synthesizing such training data from generative models is a highly coveted prize due to the reduced cost of data collection. However, synthesizing training images for compositional learning presents three challenges: (1) efficiency in generating large quantities of images, (2) text alignment between the generated image and the caption in the exact place of the subtle change, and (3) image fidelity in ensuring sufficient similarity with the original real images in all other places. We propose SPARCL (Synthetic Perturbations for Advancing Robust Compositional Learning), which integrates image feature injection into a fast text-to-image generative model, followed by an image style transfer step, to meet the three challenges. Further, to cope with any residual issues of text alignment, we propose an adaptive margin loss to filter out potentially incorrect synthetic samples and focus the learning on informative hard samples. Evaluation on four compositional understanding benchma
In a post-industrial society, the workplace is dominated primarily by Knowledge Work, which is achieved mostly through human cognitive processing, such as analysis, comprehension, evaluation, and decision-making. Many of these processes have limited support from technology in the same way that physical tasks have been enabled through a host of tools from hammers to shovels and hydraulic lifts. To develop a suite of cognitive tools, we first need to understand which processes humans use to complete work tasks. In the past century several classifications (e.g., Blooms) of cognitive processes have emerged, and we assessed their viability as the basis for designing tools that support cognitive work. This study re-used an existing data set composed of interviews of environmental scientists about their core work. While the classification uncovered many instances of cognitive process, the results showed that the existing cognitive process classifications do not provide a sufficiently comprehensive deconstruction of the human cognitive processes; the work is quite simply too abstract to be operational.
This paper explores the question of creating and maintaining terrain maps in environments where the terrain changes. The specific example explored is the construction of terrain maps from 3D LiDAR measurements on an electric rope shovel. The approach extends the height grid representation of terrain to include a Hidden Markov Model in each cell, enabling confidence-based mapping of constantly changing terrain. There are inherent difficulties in this problem, including semantic labelling of the LiDAR measurements associated with machinery and determining the pose of the sensor. Solutions to both of these problems are explored. The significance of this work lies in the need for accurate terrain mapping to support autonomous machine operation.
Beam profile engineering, where a desired optical intensity distribution can be generated by an array of phase shifting (or amplitude changing) elements is a promising approach in laser material processing. For example, a spatial light modulator (SLM) is a dynamic diffractive optical element allowing for experimental implementations of controllable beam profile. Scalar Mathieu beams have elliptical intensity distribution perceivable as optical knives in the transverse plane and scalar Weber beams have a parabolic distribution, which enables us to call them optical shovels. Here, we introduce vector versions of scalar Mathieu and Weber beams and use those vector beams as a basis to construct controllable on-axis phase and amplitude distributions with polarization control. Further, we generate individual components of optical knife and shovel beams experimentally using SLMs as a toy model and report on our achievements in the control over the beam shape, dimensions and polarization along the propagation axis.
We consider energy-dispersive X-ray Fluorescence (EDXRF) applications where the fundamental parameters method is impractical such as when instrument parameters are unavailable. For example, on a mining shovel or conveyor belt, rocks are constantly moving (leading to varying angles of incidence and distances) and there may be other factors not accounted for (like dust). Neural networks do not require instrument and fundamental parameters but training neural networks requires XRF spectra labelled with elemental composition, which is often limited because of its expense. We develop a neural network model that learns from limited labelled data and also benefits from domain knowledge by learning to invert a forward model. The forward model uses transition energies and probabilities of all elements and parameterized distributions to approximate other fundamental and instrument parameters. We evaluate the model and baseline models on a rock dataset from a lithium mineral exploration project. Our model works particularly well for some low-Z elements (Li, Mg, Al, and K) as well as some high-Z elements (Sn and Pb) despite these elements being outside the suitable range for common spectromete
We establish a theoretical method which goes beyond the weak coupling and Markovian approximations while remaining intuitive, using a quantum master equation in a larger Hilbert space. The method is applicable to all impurity Hamiltonians tunnel-coupled to one (or multiple) baths of free fermions. The accuracy of the method is in principle not limited by the system-bath coupling strength, but rather by the shape of the spectral density and it is especially suited to study situations far away from the wide-band limit. In analogy to the bosonic case, we call it the fermionic reaction coordinate mapping. As an application we consider a thermoelectric device made of two Coulomb-coupled quantum dots. We pay particular attention to the regime where this device operates as an autonomous Maxwell demon shoveling electrons against the voltage bias thanks to information. Contrary to previous studies we do not rely on a Markovian weak coupling description. Our numerical findings reveal that in the regime of strong coupling and non-Markovianity, the Maxwell demon is often doomed to disappear except in a narrow parameter regime of small power output.
We investigate a fundamental model from open-pit mining, which is a cyclic system consisting of a shovel, traveling loaded, unloading facility, and traveling back empty. The interaction of these subsystem determines the capacity of the shovel, which is the fundamental quantity of interest. To determine this capacity one needs the stationary probability that the shovel is idle. Because an exact analysis of the performance of the system is out of reach, besides of simulations there are various approximation algorithms proposed in the literature which stem from computer science and can be characterized as general purpose algorithms. We propose for solving the special problem under mining conditions an extremely simple algorithm. Comparison with several general purpose algorithms shows that for realistic situations the special algorithm outperforms the precision of the general purpose algorithms. This holds even if these general purpose candidates incorporate more details of the underlying models than our simple algorithm, which works on a strongly reduced model. The comparison and assessment is done with extensive simulations on a level of detail which the general purpose algorithms a
Optimal transport (OT) theory can be informally described using the words of the French mathematician Gaspard Monge (1746-1818): A worker with a shovel in hand has to move a large pile of sand lying on a construction site. The goal of the worker is to erect with all that sand a target pile with a prescribed shape (for example, that of a giant sand castle). Naturally, the worker wishes to minimize her total effort, quantified for instance as the total distance or time spent carrying shovelfuls of sand. Mathematicians interested in OT cast that problem as that of comparing two probability distributions, two different piles of sand of the same volume. They consider all of the many possible ways to morph, transport or reshape the first pile into the second, and associate a "global" cost to every such transport, using the "local" consideration of how much it costs to move a grain of sand from one place to another. Recent years have witnessed the spread of OT in several fields, thanks to the emergence of approximate solvers that can scale to sizes and dimensions that are relevant to data sciences. Thanks to this newfound scalability, OT is being increasingly used to unlock various proble