Active vision, also known as active perception, refers to the process of actively selecting where and how to look in order to gather task-relevant information. It is a critical component of efficient perception and decision-making in humans and advanced embodied agents. Recently, the use of Multimodal Large Language Models (MLLMs) as central planning and decision-making modules in robotic systems has gained extensive attention. However, despite the importance of active perception in embodied intelligence, there is little to no exploration of how MLLMs can be equipped with or learn active perception capabilities. In this paper, we first provide a systematic definition of MLLM-based active perception tasks. We point out that the recently proposed GPT-o3 model's zoom-in search strategy can be regarded as a special case of active perception; however, it still suffers from low search efficiency and inaccurate region selection. To address these issues, we propose ACTIVE-O3, a purely reinforcement learning based training framework built on top of GRPO, designed to equip MLLMs with active perception capabilities. We further establish a comprehensive benchmark suite to evaluate ACTIVE-O3 ac
Dense suspensions of self-propelled bacteria and related active fluids exhibit spontaneous flow generation, vortex formation, and spatiotemporally chaotic dynamics despite operating at vanishingly small Reynolds numbers. These phenomena, commonly referred to as active turbulence, display striking visual and statistical similarities to classical inertial turbulence while arising from fundamentally different nonequilibrium mechanisms. In this article, we present a combined review and theoretical study of hydrodynamic models for dense active fluids, with particular emphasis on bacterial suspensions described by the Toner--Tu--Swift--Hohenberg (TTSH) framework. We review key experimental and theoretical developments underlying the analogy between active and inertial turbulence, highlighting the emergence of multiple dynamical regimes and the conditions under which universal spectral and intermittent behavior arises in homogeneous systems. Moving beyond the conventional assumption of spatially uniform activity, we introduce a minimal model in which the activity field is heterogeneous and dynamically advected by the flow it generates. Thus treating activity as a spatiotemporally evolving
Increasing evidence suggests that active matter exhibits instances of mixed symmetry that cannot be fully described by either polar or nematic formalism. Here, we introduce a minimal model that integrates self-propulsion into the active nematic framework. Our linear stability analyses reveal how self-propulsion shifts the onset of instability, fundamentally altering the dynamical landscape. Numerical simulations confirm these predictions, showing that self-propulsion induces anti-hyperuniform fluctuations, anomalous long-range order in vorticity, and non-universal self-similar energy cascades. Notably, these long-range ordered states emerge within the active turbulence regime well before the transition to a flocking state. Additionally, our analyses highlight a non-monotonic dependence of self-organization on self-propulsion, with optimal states characterized by a peak in correlation length. These findings are relevant for understanding of active nematic systems that self-propel, such as migrating cell layers or swarming bacteria, and offer new avenues for designing synthetic systems with tailored collective behaviours, bridging the gap between active nematics and self-propulsive s
We investigate the evolution of subsurface flows during the emergence and the active phase of sunspot regions using the time-distance helioseismology analysis of the full-disk Dopplergrams from the Helioseismic and Magnetic Imager (HMI) onboard the Solar Dynamics Observatory (SDO). We present an analysis of emerging active regions of various types, including delta-type active regions and regions with the reverse polarity order (`anti-Hale active regions'). The results reveal strong vortical and shearing flows during the emergence of magnetic flux, as well as the process of formation of large-scale converging flow patterns around developing active regions, predominantly in the top 6 Mm deep layers of the convection zone. Our analysis revealed a significant correlation between the flow divergence and helicity in the active regions with their flaring activity, indicating that measuring characteristics of subsurface flows can contribute to flare forecasting.
Vision Language Models (VLMs) excel at visual question answering (VQA) but remain limited to snapshot vision, reasoning from static images. In contrast, embodied agents require ambulatory vision, actively moving to obtain more informative views. We introduce Visually Grounded Active View Selection (VG-AVS), a task that selects the most informative next viewpoint using only the visual information in the current image, without relying on scene memory or external knowledge. To support this task, we construct a synthetic dataset with automatically generated paired query-target views and question-answer prompts. We also propose a framework that fine-tunes pretrained VLMs through supervised fine-tuning (SFT) followed by RL-based policy optimization. Our approach achieves strong question answering performance based on viewpoint selection and generalizes robustly to unseen synthetic and real scenes. Furthermore, incorporating our learned VG-AVS framework into existing scene-exploration-based EQA systems improves downstream question-answering accuracy.
Navigation in dynamic environments requires autonomous systems to reason about uncertainties in the behavior of other agents. In this paper, we introduce a unified framework that combines trajectory planning with multimodal predictions and active probing to enhance decision-making under uncertainty. We develop a novel risk metric that seamlessly integrates multimodal prediction uncertainties through mixture models. When these uncertainties follow a Gaussian mixture distribution, we prove that our risk metric admits a closed-form solution, and is always finite, thus ensuring analytical tractability. To reduce prediction ambiguity, we incorporate an active probing mechanism that strategically selects actions to improve its estimates of behavioral parameters of other agents, while simultaneously handling multimodal uncertainties. We extensively evaluate our framework in autonomous navigation scenarios using the MetaDrive simulation environment. Results demonstrate that our active probing approach successfully navigates complex traffic scenarios with uncertain predictions. Additionally, our framework shows robust performance across diverse traffic agent behavior models, indicating its
Active learning, a powerful paradigm in machine learning, aims at reducing labeling costs by selecting the most informative samples from an unlabeled dataset. However, the traditional active learning process often demands extensive computational resources, hindering scalability and efficiency. In this paper, we address this critical issue by presenting a novel method designed to alleviate the computational burden associated with active learning on massive datasets. To achieve this goal, we introduce a simple, yet effective method-agnostic framework that outlines how to strategically choose and annotate data points, optimizing the process for efficiency while maintaining model performance. Through case studies, we demonstrate the effectiveness of our proposed method in reducing computational costs while maintaining or, in some cases, even surpassing baseline model outcomes. Code is available at https://github.com/aimotive/Compute-Efficient-Active-Learning.
An open problem in artificial intelligence is how systems can flexibly learn discrete abstractions that are useful for solving inherently continuous problems. Previous work in computational neuroscience has considered this functional integration of discrete and continuous variables during decision-making under the formalism of active inference (Parr, Friston & de Vries, 2017; Parr & Friston, 2018). However, their focus is on the expressive physical implementation of categorical decisions and the hierarchical mixed generative model is assumed to be known. As a consequence, it is unclear how this framework might be extended to learning. We therefore present a novel hierarchical hybrid active inference agent in which a high-level discrete active inference planner sits above a low-level continuous active inference controller. We make use of recent work in recurrent switching linear dynamical systems (rSLDS) which implement end-to-end learning of meaningful discrete representations via the piecewise linear decomposition of complex continuous dynamics (Linderman et al., 2016). The representations learned by the rSLDS inform the structure of the hybrid decision-making agent and al
We develop an active inference route-planning method for the autonomous control of intelligent agents. The aim is to reconnoiter a geographical area to maintain a common operational picture. To achieve this, we construct an evidence map that reflects our current understanding of the situation, incorporating both positive and "negative" sensor observations of possible target objects collected over time, and diffusing the evidence across the map as time progresses. The generative model of active inference uses Dempster-Shafer theory and a Gaussian sensor model, which provides input to the agent. The generative process employs a Bayesian approach to update a posterior probability distribution. We calculate the variational free energy for all positions within the area by assessing the divergence between a pignistic probability distribution of the evidence map and a posterior probability distribution of a target object based on the observations, including the level of surprise associated with receiving new observations. Using the free energy, we direct the agents' movements in a simulation by taking an incremental step toward a position that minimizes the free energy. This approach addr
This paper presents a novel general-purpose guided stereo paradigm that mimics the active stereo principle by replacing the unreliable physical pattern projector with a depth sensor. It works by projecting virtual patterns consistent with the scene geometry onto the left and right images acquired by a conventional stereo camera, using the sparse hints obtained from a depth sensor, to facilitate the visual correspondence. Purposely, any depth sensing device can be seamlessly plugged into our framework, enabling the deployment of a virtual active stereo setup in any possible environment and overcoming the severe limitations of physical pattern projection, such as the limited working range and environmental conditions. Exhaustive experiments on indoor and outdoor datasets featuring both long and close range, including those providing raw, unfiltered depth hints from off-the-shelf depth sensors, highlight the effectiveness of our approach in notably boosting the robustness and accuracy of algorithms and deep stereo without any code modification and even without re-training. Additionally, we assess the performance of our strategy on active stereo evaluation datasets with conventional pa
Active Learning (AL) deals with identifying the most informative samples for labeling to reduce data annotation costs for supervised learning tasks. AL research suffers from the fact that lifts from literature generalize poorly and that only a small number of repetitions of experiments are conducted. To overcome these obstacles, we propose CDALBench, the first active learning benchmark which includes tasks in computer vision, natural language processing and tabular learning. Furthermore, by providing an efficient, greedy oracle, CDALBench can be evaluated with 50 runs for each experiment. We show, that both the cross-domain character and a large amount of repetitions are crucial for sophisticated evaluation of AL research. Concretely, we show that the superiority of specific methods varies over the different domains, making it important to evaluate Active Learning with a cross-domain benchmark. Additionally, we show that having a large amount of runs is crucial. With only conducting three runs as often done in the literature, the superiority of specific methods can strongly vary with the specific runs. This effect is so strong, that, depending on the seed, even a well-established m
Active learning algorithms have been an integral part of recent advances in artificial intelligence. However, the research in the field is widely varying and lacks an overall organizing leans. We outline a Markovian formalism for the field of active learning and survey the literature to demonstrate the organizing capability of our proposed formalism. Our formalism takes a partially observable Markovian system approach to the active learning process as a whole. We specifically outline how querying, dataset augmentation, reward updates, and other aspects of active learning can be viewed as a transition between meta-states in a Markovian system, and give direction into how other aspects of active learning can fit into our formalism.
Your task is to detect a submarine with your active sonar. The submarine can hear your active sonar before you can detect him. If the submarine is fast enough he can evade you before you can detect him. How do you then detect him? If you are using your active sonar continuously you will not detect him. Likewise, if you are not using your sonar at all. In between those two extremes there is an optimum. We will find that optimum. Or said more precisely and general: In the same two dimensional region two platforms are present. One platform, the searcher, equipped with one active sensor (sonar, radar, lidar etc.), is trying to detect the other platform, the target, by means of its active sensor. The target tries to avoid detection using only a passive sensor to detect the searcher. The target can detect the active sensor before the searcher can detect the target (forestalling). The active sensor is therefore used intermittently to surprise the target. The aim of this study is to quantify the passive period of the active sensor by minimizing missed detection opportunities. The active period is subsequently found by maximizing the average detection width of the searcher sensor over time.
Active reconfigurable intelligent surfaces (RISs) have recently been proposed to compensate for the severe multiplicative fading effect of conventional passive RIS-aided systems. Each reflecting element of active RISs is assisted by an amplifier such that the incident signal can be reflected and amplified instead of only being reflected as in passive RIS-aided systems. This work addresses the practical challenge that, on the one hand, in active RIS-aided systems the perfect individual CSI of the RIS-aided channels cannot be acquired due to the lack of signal processing power at the active RISs, but, on the other hand, this CSI is required to calculate the expected system data rate and RIS transmit power needed for transceiver design. To address this issue, we first derive closed-form expressions for the average achievable rate and the average RIS transmit power based on partial CSI of the RIS-aided channels. Then, we formulate an average achievable rate maximization problem for jointly optimizing the active beamforming at both the base station (BS) and the RIS. This problem is then tackled using the majorization--minimization (MM) algorithm framework, and, for each iteration, semi-
We present an overview of phase field modeling of active matter systems as a tool for capturing various aspects of complex and active interfaces. We first describe how interfaces between different phases are characterized in phase field models and provide simple fundamental governing equations that describe their evolution. For a simple model, we then show how physical properties of the interface, such as surface tension and interface thickness, can be recovered from these equations. We then explain how the phase field formulation can be coupled to various active matter realizations and discuss three particular examples of continuum biphasic active matter: active nematic-isotropic interfaces, active matter in viscoelastic environments, and active shells in fluid background. Finally, we describe how multiple phase fields can be used to model active cellular monolayers and present a general framework that can be applied to the study of tissue behaviour and collective migration.
We address the problem of active mapping with a continually-learned neural scene representation, namely Active Neural Mapping. The key lies in actively finding the target space to be explored with efficient agent movement, thus minimizing the map uncertainty on-the-fly within a previously unseen environment. In this paper, we examine the weight space of the continually-learned neural field, and show empirically that the neural variability, the prediction robustness against random weight perturbation, can be directly utilized to measure the instant uncertainty of the neural map. Together with the continuous geometric information inherited in the neural map, the agent can be guided to find a traversable path to gradually gain knowledge of the environment. We present for the first time an active mapping system with a coordinate-based implicit neural representation for online scene reconstruction. Experiments in the visually-realistic Gibson and Matterport3D environment demonstrate the efficacy of the proposed method.
Active inference offers a principled account of behavior as minimizing average sensory surprise over time. Applications of active inference to control problems have heretofore tended to focus on finite-horizon or discounted-surprise problems, despite deriving from the infinite-horizon, average-surprise imperative of the free-energy principle. Here we derive an infinite-horizon, average-surprise formulation of active inference from optimal control principles. Our formulation returns to the roots of active inference in neuroanatomy and neurophysiology, formally reconnecting active inference to optimal feedback control. Our formulation provides a unified objective functional for sensorimotor control and allows for reference states to vary over time.
The ability of many living systems to actively self-propel underlies critical biomedical, environmental, and industrial processes. While such active transport is well-studied in uniform settings, environmental complexities such as geometric constraints, mechanical cues, and external stimuli such as chemical gradients and fluid flow can strongly influence transport. In this chapter, we describe recent progress in the study of active transport in such complex environments, focusing on two prominent biological systems -- bacteria and eukaryotic cells -- as archetypes of active matter. We review research findings highlighting how environmental factors can fundamentally alter cellular motility, hindering or promoting active transport in unexpected ways, and giving rise to fascinating behaviors such as directed migration and large-scale clustering. In parallel, we describe specific open questions and promising avenues for future research. Furthermore, given the diverse forms of active matter -- ranging from enzymes and driven biopolymer assemblies, to microorganisms and synthetic microswimmers, to larger animals and even robots -- we also describe connections to other active systems as w
Active particle systems of interacting self-propelled particles offer a versatile framework for modeling complex systems. When employed to describe aspects of animal behavior, the complexity of animal movement and decision-making often requires the use of unique types of effective interactions between the particles -- notably nonreciprocal effective forces that do not obey the usual conservation laws of Newtonian mechanics. Here we review two recent empirically-motivated models, of two very different types of animal behavior, where the behavior is described in terms of active particles which interact through nonreciprocal effective forces. The first model describes the dynamics of animal contests, wherein typically two rivals fight over a localized resource. The uniquely shaped effective potentials between the model's 'contestant particles' manifest the adversarial nature of contest interactions and capture the dynamical essence of contest behavior in space and time. The second model describes the stabilization of cohesive swarms through long-range and adaptive gravity-like attraction. This 'adaptive gravity' model explains the observed mass and velocity profiles of laboratory midg
Domain adaptive active learning is leading the charge in label-efficient training of neural networks. For semantic segmentation, state-of-the-art models jointly use two criteria of uncertainty and diversity to select training labels, combined with a pixel-wise acquisition strategy. However, we show that such methods currently suffer from a class imbalance issue which degrades their performance for larger active learning budgets. We then introduce Class Balanced Dynamic Acquisition (CBDA), a novel active learning method that mitigates this issue, especially in high-budget regimes. The more balanced labels increase minority class performance, which in turn allows the model to outperform the previous baseline by 0.6, 1.7, and 2.4 mIoU for budgets of 5%, 10%, and 20%, respectively. Additionally, the focus on minority classes leads to improvements of the minimum class performance of 0.5, 2.9, and 4.6 IoU respectively. The top-performing model even exceeds the fully supervised baseline, showing that a more balanced label than the entire ground truth can be beneficial.