搜索 — ResearchTracker

Automated environment configuration is a critical bottleneck in scaling software engineering (SWE) automation. To provide a reliable evaluation standard for this task, we present Multi-Docker-Eval benchmark. It includes 40 real-world repositories spanning 9 programming languages and measures both success in achieving executable states and efficiency under realistic constraints. Our extensive evaluation of state-of-the-art LLMs and agent frameworks reveals key insights: (1) the overall success rate of current models is low (F2P at most 37.7%), with environment construction being the primary bottleneck; (2) model size and reasoning length are not decisive factors, and open-source models like DeepSeek-V3.1 and Kimi-K2 are competitive in both efficiency and effectiveness; (3) agent framework and programming language also have significantly influence on success rate. These findings provide actionable guidelines for building scalable, fully automated SWE pipelines.

Towards Learning Boulder Excavation with Hydraulic Excavators

arXiv2025-09-22作者：Jonas Gruetter, Lorenzo Terenzi, Pascal Egli

Construction sites frequently require removing large rocks before excavation or grading can proceed. Human operators typically extract these boulders using only standard digging buckets, avoiding time-consuming tool changes to specialized grippers. This task demands manipulating irregular objects with unknown geometries in harsh outdoor environments where dust, variable lighting, and occlusions hinder perception. The excavator must adapt to varying soil resistance--dragging along hard-packed surfaces or penetrating soft ground--while coordinating multiple hydraulic joints to secure rocks using a shovel. Current autonomous excavation focuses on continuous media (soil, gravel) or uses specialized grippers with detailed geometric planning for discrete objects. These approaches either cannot handle large irregular rocks or require impractical tool changes that interrupt workflow. We train a reinforcement learning policy in simulation using rigid-body dynamics and analytical soil models. The policy processes sparse LiDAR points (just 20 per rock) from vision-based segmentation and proprioceptive feedback to control standard excavator buckets. The learned agent discovers different strate

查看原文 ↗

Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data

arXiv2025-03-03作者：Haoxin Li, Boyang Li

Paired image-text data with subtle variations in-between (e.g., people holding surfboards vs. people holding shovels) hold the promise of producing Vision-Language Models with proper compositional understanding. Synthesizing such training data from generative models is a highly coveted prize due to the reduced cost of data collection. However, synthesizing training images for compositional learning presents three challenges: (1) efficiency in generating large quantities of images, (2) text alignment between the generated image and the caption in the exact place of the subtle change, and (3) image fidelity in ensuring sufficient similarity with the original real images in all other places. We propose SPARCL (Synthetic Perturbations for Advancing Robust Compositional Learning), which integrates image feature injection into a fast text-to-image generative model, followed by an image style transfer step, to meet the three challenges. Further, to cope with any residual issues of text alignment, we propose an adaptive margin loss to filter out potentially incorrect synthetic samples and focus the learning on informative hard samples. Evaluation on four compositional understanding benchma

查看原文 ↗

Untangling Cognitive Processes Underlying Knowledge Work

arXiv2024-07-03作者：Ginar Niwanputri, Elaine Toms, Andrew Simpson

In a post-industrial society, the workplace is dominated primarily by Knowledge Work, which is achieved mostly through human cognitive processing, such as analysis, comprehension, evaluation, and decision-making. Many of these processes have limited support from technology in the same way that physical tasks have been enabled through a host of tools from hammers to shovels and hydraulic lifts. To develop a suite of cognitive tools, we first need to understand which processes humans use to complete work tasks. In the past century several classifications (e.g., Blooms) of cognitive processes have emerged, and we assessed their viability as the basis for designing tools that support cognitive work. This study re-used an existing data set composed of interviews of environmental scientists about their core work. While the classification uncovered many instances of cognitive process, the results showed that the existing cognitive process classifications do not provide a sufficiently comprehensive deconstruction of the human cognitive processes; the work is quite simply too abstract to be operational.

查看原文 ↗

Probabilistic Height Grid Terrain Mapping for Mining Shovels using LiDAR

arXiv2024-05-27作者：Vedant Bhandari, Jasmin James, Tyson Phillips

This paper explores the question of creating and maintaining terrain maps in environments where the terrain changes. The specific example explored is the construction of terrain maps from 3D LiDAR measurements on an electric rope shovel. The approach extends the height grid representation of terrain to include a Hidden Markov Model in each cell, enabling confidence-based mapping of constantly changing terrain. There are inherent difficulties in this problem, including semantic labelling of the LiDAR measurements associated with machinery and determining the pose of the sensor. Solutions to both of these problems are explored. The significance of this work lies in the need for accurate terrain mapping to support autonomous machine operation.

查看原文 ↗

Vector Beams with Parabolic and Elliptic Cross-Sections for Laser Material Processing Applications

arXiv2023-12-05作者：Sergej Orlov, Vitalis Vosylius, Pavel Gotovski

Beam profile engineering, where a desired optical intensity distribution can be generated by an array of phase shifting (or amplitude changing) elements is a promising approach in laser material processing. For example, a spatial light modulator (SLM) is a dynamic diffractive optical element allowing for experimental implementations of controllable beam profile. Scalar Mathieu beams have elliptical intensity distribution perceivable as optical knives in the transverse plane and scalar Weber beams have a parabolic distribution, which enables us to call them optical shovels. Here, we introduce vector versions of scalar Mathieu and Weber beams and use those vector beams as a basis to construct controllable on-axis phase and amplitude distributions with polarization control. Further, we generate individual components of optical knife and shovel beams experimentally using SLMs as a toy model and report on our achievements in the control over the beam shape, dimensions and polarization along the propagation axis.

查看原文 ↗

Auto-Encoder Neural Network Incorporating X-Ray Fluorescence Fundamental Parameters with Machine Learning

arXiv2022-10-21作者：Matthew Dirks, David Poole

We consider energy-dispersive X-ray Fluorescence (EDXRF) applications where the fundamental parameters method is impractical such as when instrument parameters are unavailable. For example, on a mining shovel or conveyor belt, rocks are constantly moving (leading to varying angles of incidence and distances) and there may be other factors not accounted for (like dust). Neural networks do not require instrument and fundamental parameters but training neural networks requires XRF spectra labelled with elemental composition, which is often limited because of its expense. We develop a neural network model that learns from limited labelled data and also benefits from domain knowledge by learning to invert a forward model. The forward model uses transition energies and probabilities of all elements and parameterized distributions to approximate other fundamental and instrument parameters. We evaluate the model and baseline models on a rock dataset from a lithium mineral exploration project. Our model works particularly well for some low-Z elements (Li, Mg, Al, and K) as well as some high-Z elements (Sn and Pb) despite these elements being outside the suitable range for common spectromete

查看原文 ↗

Fermionic reaction coordinates and their application to an autonomous Maxwell demon in the strong coupling regime

arXiv2017-11-24作者：Philipp Strasberg, Gernot Schaller, Thomas L. Schmidt

We establish a theoretical method which goes beyond the weak coupling and Markovian approximations while remaining intuitive, using a quantum master equation in a larger Hilbert space. The method is applicable to all impurity Hamiltonians tunnel-coupled to one (or multiple) baths of free fermions. The accuracy of the method is in principle not limited by the system-bath coupling strength, but rather by the shape of the spectral density and it is especially suited to study situations far away from the wide-band limit. In analogy to the bosonic case, we call it the fermionic reaction coordinate mapping. As an application we consider a thermoelectric device made of two Coulomb-coupled quantum dots. We pay particular attention to the regime where this device operates as an autonomous Maxwell demon shoveling electrons against the voltage bias thanks to information. Contrary to previous studies we do not rely on a Markovian weak coupling description. Our numerical findings reveal that in the regime of strong coupling and non-Markovianity, the Maxwell demon is often doomed to disappear except in a narrow parameter regime of small power output.

查看原文 ↗

Heuristic Approximations for Closed Networks: A Case Study in Open-pit Mining

arXiv2016-03-25作者：Hans Daduna, Ruslan Krenzler, Robert Ritter

We investigate a fundamental model from open-pit mining, which is a cyclic system consisting of a shovel, traveling loaded, unloading facility, and traveling back empty. The interaction of these subsystem determines the capacity of the shovel, which is the fundamental quantity of interest. To determine this capacity one needs the stationary probability that the shovel is idle. Because an exact analysis of the performance of the system is out of reach, besides of simulations there are various approximation algorithms proposed in the literature which stem from computer science and can be characterized as general purpose algorithms. We propose for solving the special problem under mining conditions an extremely simple algorithm. Comparison with several general purpose algorithms shows that for realistic situations the special algorithm outperforms the precision of the general purpose algorithms. This holds even if these general purpose candidates incorporate more details of the underlying models than our simple algorithm, which works on a strongly reduced model. The comparison and assessment is done with extensive simulations on a level of detail which the general purpose algorithms a

查看原文 ↗

Computational Optimal Transport

arXiv2018-03-01作者：Gabriel Peyré, Marco Cuturi

Optimal transport (OT) theory can be informally described using the words of the French mathematician Gaspard Monge (1746-1818): A worker with a shovel in hand has to move a large pile of sand lying on a construction site. The goal of the worker is to erect with all that sand a target pile with a prescribed shape (for example, that of a giant sand castle). Naturally, the worker wishes to minimize her total effort, quantified for instance as the total distance or time spent carrying shovelfuls of sand. Mathematicians interested in OT cast that problem as that of comparing two probability distributions, two different piles of sand of the same volume. They consider all of the many possible ways to morph, transport or reshape the first pile into the second, and associate a "global" cost to every such transport, using the "local" consideration of how much it costs to move a grain of sand from one place to another. Recent years have witnessed the spread of OT in several fields, thanks to the emergence of approximate solvers that can scale to sizes and dimensions that are relevant to data sciences. Thanks to this newfound scalability, OT is being increasingly used to unlock various proble

查看原文 ↗

搜索结果：shoveled

Multi-Docker-Eval: A `Shovel of the Gold Rush' Benchmark on Automatic Environment Building for Software Engineering

Towards Learning Boulder Excavation with Hydraulic Excavators

Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data

Untangling Cognitive Processes Underlying Knowledge Work

Probabilistic Height Grid Terrain Mapping for Mining Shovels using LiDAR

Vector Beams with Parabolic and Elliptic Cross-Sections for Laser Material Processing Applications

Auto-Encoder Neural Network Incorporating X-Ray Fluorescence Fundamental Parameters with Machine Learning

Fermionic reaction coordinates and their application to an autonomous Maxwell demon in the strong coupling regime

Heuristic Approximations for Closed Networks: A Case Study in Open-pit Mining

Computational Optimal Transport