Active vision, also known as active perception, refers to the process of actively selecting where and how to look in order to gather task-relevant information. It is a critical component of efficient perception and decision-making in humans and advanced embodied agents. Recently, the use of Multimodal Large Language Models (MLLMs) as central planning and decision-making modules in robotic systems has gained extensive attention. However, despite the importance of active perception in embodied intelligence, there is little to no exploration of how MLLMs can be equipped with or learn active perception capabilities. In this paper, we first provide a systematic definition of MLLM-based active perception tasks. We point out that the recently proposed GPT-o3 model's zoom-in search strategy can be regarded as a special case of active perception; however, it still suffers from low search efficiency and inaccurate region selection. To address these issues, we propose ACTIVE-O3, a purely reinforcement learning based training framework built on top of GRPO, designed to equip MLLMs with active perception capabilities. We further establish a comprehensive benchmark suite to evaluate ACTIVE-O3 ac
Increasing evidence suggests that active matter exhibits instances of mixed symmetry that cannot be fully described by either polar or nematic formalism. Here, we introduce a minimal model that integrates self-propulsion into the active nematic framework. Our linear stability analyses reveal how self-propulsion shifts the onset of instability, fundamentally altering the dynamical landscape. Numerical simulations confirm these predictions, showing that self-propulsion induces anti-hyperuniform fluctuations, anomalous long-range order in vorticity, and non-universal self-similar energy cascades. Notably, these long-range ordered states emerge within the active turbulence regime well before the transition to a flocking state. Additionally, our analyses highlight a non-monotonic dependence of self-organization on self-propulsion, with optimal states characterized by a peak in correlation length. These findings are relevant for understanding of active nematic systems that self-propel, such as migrating cell layers or swarming bacteria, and offer new avenues for designing synthetic systems with tailored collective behaviours, bridging the gap between active nematics and self-propulsive s
Dense suspensions of self-propelled bacteria and related active fluids exhibit spontaneous flow generation, vortex formation, and spatiotemporally chaotic dynamics despite operating at vanishingly small Reynolds numbers. These phenomena, commonly referred to as active turbulence, display striking visual and statistical similarities to classical inertial turbulence while arising from fundamentally different nonequilibrium mechanisms. In this article, we present a combined review and theoretical study of hydrodynamic models for dense active fluids, with particular emphasis on bacterial suspensions described by the Toner--Tu--Swift--Hohenberg (TTSH) framework. We review key experimental and theoretical developments underlying the analogy between active and inertial turbulence, highlighting the emergence of multiple dynamical regimes and the conditions under which universal spectral and intermittent behavior arises in homogeneous systems. Moving beyond the conventional assumption of spatially uniform activity, we introduce a minimal model in which the activity field is heterogeneous and dynamically advected by the flow it generates. Thus treating activity as a spatiotemporally evolving
We investigate the evolution of subsurface flows during the emergence and the active phase of sunspot regions using the time-distance helioseismology analysis of the full-disk Dopplergrams from the Helioseismic and Magnetic Imager (HMI) onboard the Solar Dynamics Observatory (SDO). We present an analysis of emerging active regions of various types, including delta-type active regions and regions with the reverse polarity order (`anti-Hale active regions'). The results reveal strong vortical and shearing flows during the emergence of magnetic flux, as well as the process of formation of large-scale converging flow patterns around developing active regions, predominantly in the top 6 Mm deep layers of the convection zone. Our analysis revealed a significant correlation between the flow divergence and helicity in the active regions with their flaring activity, indicating that measuring characteristics of subsurface flows can contribute to flare forecasting.
Navigation in dynamic environments requires autonomous systems to reason about uncertainties in the behavior of other agents. In this paper, we introduce a unified framework that combines trajectory planning with multimodal predictions and active probing to enhance decision-making under uncertainty. We develop a novel risk metric that seamlessly integrates multimodal prediction uncertainties through mixture models. When these uncertainties follow a Gaussian mixture distribution, we prove that our risk metric admits a closed-form solution, and is always finite, thus ensuring analytical tractability. To reduce prediction ambiguity, we incorporate an active probing mechanism that strategically selects actions to improve its estimates of behavioral parameters of other agents, while simultaneously handling multimodal uncertainties. We extensively evaluate our framework in autonomous navigation scenarios using the MetaDrive simulation environment. Results demonstrate that our active probing approach successfully navigates complex traffic scenarios with uncertain predictions. Additionally, our framework shows robust performance across diverse traffic agent behavior models, indicating its
We develop an active inference route-planning method for the autonomous control of intelligent agents. The aim is to reconnoiter a geographical area to maintain a common operational picture. To achieve this, we construct an evidence map that reflects our current understanding of the situation, incorporating both positive and "negative" sensor observations of possible target objects collected over time, and diffusing the evidence across the map as time progresses. The generative model of active inference uses Dempster-Shafer theory and a Gaussian sensor model, which provides input to the agent. The generative process employs a Bayesian approach to update a posterior probability distribution. We calculate the variational free energy for all positions within the area by assessing the divergence between a pignistic probability distribution of the evidence map and a posterior probability distribution of a target object based on the observations, including the level of surprise associated with receiving new observations. Using the free energy, we direct the agents' movements in a simulation by taking an incremental step toward a position that minimizes the free energy. This approach addr
Active learning, a powerful paradigm in machine learning, aims at reducing labeling costs by selecting the most informative samples from an unlabeled dataset. However, the traditional active learning process often demands extensive computational resources, hindering scalability and efficiency. In this paper, we address this critical issue by presenting a novel method designed to alleviate the computational burden associated with active learning on massive datasets. To achieve this goal, we introduce a simple, yet effective method-agnostic framework that outlines how to strategically choose and annotate data points, optimizing the process for efficiency while maintaining model performance. Through case studies, we demonstrate the effectiveness of our proposed method in reducing computational costs while maintaining or, in some cases, even surpassing baseline model outcomes. Code is available at https://github.com/aimotive/Compute-Efficient-Active-Learning.
Vision Language Models (VLMs) excel at visual question answering (VQA) but remain limited to snapshot vision, reasoning from static images. In contrast, embodied agents require ambulatory vision, actively moving to obtain more informative views. We introduce Visually Grounded Active View Selection (VG-AVS), a task that selects the most informative next viewpoint using only the visual information in the current image, without relying on scene memory or external knowledge. To support this task, we construct a synthetic dataset with automatically generated paired query-target views and question-answer prompts. We also propose a framework that fine-tunes pretrained VLMs through supervised fine-tuning (SFT) followed by RL-based policy optimization. Our approach achieves strong question answering performance based on viewpoint selection and generalizes robustly to unseen synthetic and real scenes. Furthermore, incorporating our learned VG-AVS framework into existing scene-exploration-based EQA systems improves downstream question-answering accuracy.
This paper presents a novel general-purpose guided stereo paradigm that mimics the active stereo principle by replacing the unreliable physical pattern projector with a depth sensor. It works by projecting virtual patterns consistent with the scene geometry onto the left and right images acquired by a conventional stereo camera, using the sparse hints obtained from a depth sensor, to facilitate the visual correspondence. Purposely, any depth sensing device can be seamlessly plugged into our framework, enabling the deployment of a virtual active stereo setup in any possible environment and overcoming the severe limitations of physical pattern projection, such as the limited working range and environmental conditions. Exhaustive experiments on indoor and outdoor datasets featuring both long and close range, including those providing raw, unfiltered depth hints from off-the-shelf depth sensors, highlight the effectiveness of our approach in notably boosting the robustness and accuracy of algorithms and deep stereo without any code modification and even without re-training. Additionally, we assess the performance of our strategy on active stereo evaluation datasets with conventional pa
Active Learning (AL) deals with identifying the most informative samples for labeling to reduce data annotation costs for supervised learning tasks. AL research suffers from the fact that lifts from literature generalize poorly and that only a small number of repetitions of experiments are conducted. To overcome these obstacles, we propose CDALBench, the first active learning benchmark which includes tasks in computer vision, natural language processing and tabular learning. Furthermore, by providing an efficient, greedy oracle, CDALBench can be evaluated with 50 runs for each experiment. We show, that both the cross-domain character and a large amount of repetitions are crucial for sophisticated evaluation of AL research. Concretely, we show that the superiority of specific methods varies over the different domains, making it important to evaluate Active Learning with a cross-domain benchmark. Additionally, we show that having a large amount of runs is crucial. With only conducting three runs as often done in the literature, the superiority of specific methods can strongly vary with the specific runs. This effect is so strong, that, depending on the seed, even a well-established m
An open problem in artificial intelligence is how systems can flexibly learn discrete abstractions that are useful for solving inherently continuous problems. Previous work in computational neuroscience has considered this functional integration of discrete and continuous variables during decision-making under the formalism of active inference (Parr, Friston & de Vries, 2017; Parr & Friston, 2018). However, their focus is on the expressive physical implementation of categorical decisions and the hierarchical mixed generative model is assumed to be known. As a consequence, it is unclear how this framework might be extended to learning. We therefore present a novel hierarchical hybrid active inference agent in which a high-level discrete active inference planner sits above a low-level continuous active inference controller. We make use of recent work in recurrent switching linear dynamical systems (rSLDS) which implement end-to-end learning of meaningful discrete representations via the piecewise linear decomposition of complex continuous dynamics (Linderman et al., 2016). The representations learned by the rSLDS inform the structure of the hybrid decision-making agent and al
Your task is to detect a submarine with your active sonar. The submarine can hear your active sonar before you can detect him. If the submarine is fast enough he can evade you before you can detect him. How do you then detect him? If you are using your active sonar continuously you will not detect him. Likewise, if you are not using your sonar at all. In between those two extremes there is an optimum. We will find that optimum. Or said more precisely and general: In the same two dimensional region two platforms are present. One platform, the searcher, equipped with one active sensor (sonar, radar, lidar etc.), is trying to detect the other platform, the target, by means of its active sensor. The target tries to avoid detection using only a passive sensor to detect the searcher. The target can detect the active sensor before the searcher can detect the target (forestalling). The active sensor is therefore used intermittently to surprise the target. The aim of this study is to quantify the passive period of the active sensor by minimizing missed detection opportunities. The active period is subsequently found by maximizing the average detection width of the searcher sensor over time.
Active learning algorithms have been an integral part of recent advances in artificial intelligence. However, the research in the field is widely varying and lacks an overall organizing leans. We outline a Markovian formalism for the field of active learning and survey the literature to demonstrate the organizing capability of our proposed formalism. Our formalism takes a partially observable Markovian system approach to the active learning process as a whole. We specifically outline how querying, dataset augmentation, reward updates, and other aspects of active learning can be viewed as a transition between meta-states in a Markovian system, and give direction into how other aspects of active learning can fit into our formalism.
Planktonic active matter represents an emergent system spanning different scales: individual, population and community; and complexity arising from sub-cellular and cellular to collective and ecosystem scale dynamics. This cross-scale active matter system responds to a range of abiotic (temperature, fluid flow and light conditions) and biotic factors (nutrients, pH, secondary metabolites) characteristic to the relevant ecosystems they are part of. Active modulation of cell phenotypes, including morphology, motility, and intracellular organization enable planktonic microbes to dynamically interact with other individuals and species; and adapt - often rapidly - to the changes in their environment. In this chapter, I discuss both traditional and contemporary approaches to study the dynamics of this multi-scale active matter system from a mechanistic standpoint, with specific references to their local settings and their ability to actively tune the behaviour and physiology, and the emergent structures and functions they elicit under natural ecological constraints as well as due to the shifting climatic trends.
Active reconfigurable intelligent surfaces (RISs) have recently been proposed to compensate for the severe multiplicative fading effect of conventional passive RIS-aided systems. Each reflecting element of active RISs is assisted by an amplifier such that the incident signal can be reflected and amplified instead of only being reflected as in passive RIS-aided systems. This work addresses the practical challenge that, on the one hand, in active RIS-aided systems the perfect individual CSI of the RIS-aided channels cannot be acquired due to the lack of signal processing power at the active RISs, but, on the other hand, this CSI is required to calculate the expected system data rate and RIS transmit power needed for transceiver design. To address this issue, we first derive closed-form expressions for the average achievable rate and the average RIS transmit power based on partial CSI of the RIS-aided channels. Then, we formulate an average achievable rate maximization problem for jointly optimizing the active beamforming at both the base station (BS) and the RIS. This problem is then tackled using the majorization--minimization (MM) algorithm framework, and, for each iteration, semi-
We address the problem of active mapping with a continually-learned neural scene representation, namely Active Neural Mapping. The key lies in actively finding the target space to be explored with efficient agent movement, thus minimizing the map uncertainty on-the-fly within a previously unseen environment. In this paper, we examine the weight space of the continually-learned neural field, and show empirically that the neural variability, the prediction robustness against random weight perturbation, can be directly utilized to measure the instant uncertainty of the neural map. Together with the continuous geometric information inherited in the neural map, the agent can be guided to find a traversable path to gradually gain knowledge of the environment. We present for the first time an active mapping system with a coordinate-based implicit neural representation for online scene reconstruction. Experiments in the visually-realistic Gibson and Matterport3D environment demonstrate the efficacy of the proposed method.
We present an overview of phase field modeling of active matter systems as a tool for capturing various aspects of complex and active interfaces. We first describe how interfaces between different phases are characterized in phase field models and provide simple fundamental governing equations that describe their evolution. For a simple model, we then show how physical properties of the interface, such as surface tension and interface thickness, can be recovered from these equations. We then explain how the phase field formulation can be coupled to various active matter realizations and discuss three particular examples of continuum biphasic active matter: active nematic-isotropic interfaces, active matter in viscoelastic environments, and active shells in fluid background. Finally, we describe how multiple phase fields can be used to model active cellular monolayers and present a general framework that can be applied to the study of tissue behaviour and collective migration.
Active particle systems of interacting self-propelled particles offer a versatile framework for modeling complex systems. When employed to describe aspects of animal behavior, the complexity of animal movement and decision-making often requires the use of unique types of effective interactions between the particles -- notably nonreciprocal effective forces that do not obey the usual conservation laws of Newtonian mechanics. Here we review two recent empirically-motivated models, of two very different types of animal behavior, where the behavior is described in terms of active particles which interact through nonreciprocal effective forces. The first model describes the dynamics of animal contests, wherein typically two rivals fight over a localized resource. The uniquely shaped effective potentials between the model's 'contestant particles' manifest the adversarial nature of contest interactions and capture the dynamical essence of contest behavior in space and time. The second model describes the stabilization of cohesive swarms through long-range and adaptive gravity-like attraction. This 'adaptive gravity' model explains the observed mass and velocity profiles of laboratory midg
Living organisms need to acquire both cognitive maps for learning the structure of the world and planning mechanisms able to deal with the challenges of navigating ambiguous environments. Although significant progress has been made in each of these areas independently, the best way to integrate them is an open research question. In this paper, we propose the integration of a statistical model of cognitive map formation within an active inference agent that supports planning under uncertainty. Specifically, we examine the clone-structured cognitive graph (CSCG) model of cognitive map formation and compare a naive clone graph agent with an active inference-driven clone graph agent, in three spatial navigation scenarios. Our findings demonstrate that while both agents are effective in simple scenarios, the active inference agent is more effective when planning in challenging scenarios, in which sensory observations provide ambiguous information about location.
In this letter, we consider an intelligent reflecting surface (IRS)-aided wireless communication system, where an active or passive IRS is employed to assist the communication between an access point and a user. First, we consider the downlink/uplink communication separately and optimize the IRS placement for rate maximization with an active or passive IRS. We show that the active IRS should be deployed closer to the receiver with the IRS's decreasing amplification power; while in contrast, the passive IRS should be deployed near either the transmitter or receiver. Moreover, with optimized IRS placement, the passive IRS is shown to outperform its active counterpart when the number of reflecting elements is sufficiently large and/or the active-IRS amplification power is too small. Next, we optimize the IRS placement for both active and passive IRSs to maximize the weighted sum-rate of uplink and downlink communications. We show that in this case, the passive IRS is more likely to achieve superior rate performance. This is because the optimal active-IRS placement needs to balance the rate performance in the uplink and downlink, while deploying the passive IRS near the transmitter or