搜索 — ResearchTracker

Soft Actor-Critic (SAC) has achieved notable success in continuous control tasks but struggles in sparse reward settings, where infrequent rewards make efficient exploration challenging. While novelty-based exploration methods address this issue by encouraging the agent to explore novel states, they are not trivial to apply to SAC. In particular, managing the interaction between novelty-based exploration and SAC's stochastic policy can lead to inefficient exploration and redundant sample collection. In this paper, we propose KEA (Keeping Exploration Alive) which tackles the inefficiencies in balancing exploration strategies when combining SAC with novelty-based exploration. KEA integrates a novelty-augmented SAC with a standard SAC agent, proactively coordinated via a switching mechanism. This coordination allows the agent to maintain stochasticity in high-novelty regions, enhancing exploration efficiency and reducing repeated sample collection. We first analyze this potential issue in a 2D navigation task, and then evaluate KEA on the DeepSea hard-exploration benchmark as well as sparse reward control tasks from the DeepMind Control Suite. Compared to state-of-the-art novelty-base

DORA Explorer: Improving the Exploration Ability of LLMs Without Training

arXiv2026-04-19作者：Priya Gurjar, Md Farhan Ishmam, Kenneth Marino

Despite the rapid progress, LLMs for sequential decision-making (i.e., LLM agents) still struggle to produce diverse outputs. This leads to insufficient exploration, convergence to sub-optimal solutions, and becoming stuck in loops. Such limitations can be problematic in environments that require active exploration to gather information and make decisions. Sampling methods such as temperature scaling introduce token-level randomness but fail to produce enough diversity at the sequence level. We analyze LLM exploration in the classic Multi-Armed Bandit (MAB) setting and the Text Adventure Learning Environment Suite (TALES). We find that current decoding strategies and prompting methods like Chain-of-Thought and Tree-of-Thought are insufficient for robust exploration. To address this, we introduce DORA Explorer (Diversity-Oriented Ranking of Actions), a training-free framework for improving exploration in LLM agents. DORA generates diverse action candidates, scores them using token log-probabilities, and selects actions using a tunable exploration parameter. DORA achieves UCB-competitive performance on MAB and consistent gains across TALES, e.g., improving Qwen2.5-7B's performance fr

搜索结果：exploration

KEA: Keeping Exploration Alive by Proactively Coordinating Exploration Strategies

DORA Explorer: Improving the Exploration Ability of LLMs Without Training

A Novel Framework for Uncertainty-Driven Adaptive Exploration

OVD-Explorer: Optimism Should Not Be the Sole Pursuit of Exploration in Noisy Environments

The Initial Exploration Problem in Knowledge Graph Exploration

Exploration and Anti-Exploration with Distributional Random Network Distillation

Exploring Exploration in Bayesian Optimization

Destruction is a General Strategy to Learn Generation; Diffusion's Strength is to Take it Seriously; Exploration is the Future

High-fidelity 3D reconstruction for planetary exploration

NASA Decadal Astrobiology Research and Exploration Strategy (NASA-DARES 2025) White Paper -- Habitable Worlds Observatory Living Worlds Science Cases: Research Gaps and Needs

A New Strategy for the Exploration of Venus

Explore until Confident: Efficient Exploration for Embodied Question Answering

NASA Exoplanet Exploration Program (ExEP) Science Gap List

Gaussian Process-Based Active Exploration Strategies in Vision and Touch

SeDa: A Unified System for Dataset Discovery and Multi-Entity Augmented Semantic Exploration

Innovations in Nanotechnology: A Comprehensive Review of Applications Beyond Space Exploration

Benchmarking Bonus-Based Exploration Methods on the Arcade Learning Environment

A Semantic Approach for Big Data Exploration in Industry 4.0

Learn to Explore: on Bootstrapping Interactive Data Exploration with Meta-learning

Exploration of Faulty Hamiltonian Graphs