Our brain recognizes only a tiny fraction of sensory input, due to an information processing bottleneck. This blinds us to most visual inputs. Since we are blind to this blindness, only a recent framework highlights this bottleneck by formulating vision as mainly looking and seeing. Looking selects a tiny fraction of visual information for progression through the bottleneck, mainly by shifting gaze to center an attentional spotlight. Seeing decodes, i.e., recognizes, objects within the selected information. Since looking often occurs before seeing and evokes limited awareness, humans have the impression of seeing whole scenes clearly. According to the new framework, the bottleneck starts from the output of the primary visual cortex (V1) to downstream brain areas. This is motivated by the evidence-backed V1 Saliency Hypothesis (V1SH) that V1 creates a saliency map of the visual field to guide looking. Massive visual information loss downstream from V1 makes seeing vulnerable to ambiguity and illusions (errors). To overcome this, feedback from downstream to upstream areas such as V1 queries for additional relevant information. An integral part of this framework is the central-periphe
Regulatory stress testing frameworks, including the Comprehensive Capital Analysis and Review (CCAR) and the Internal Capital Adequacy Assessment Process (ICAAP), require robust Stressed Value-at-Risk (SVaR) estimation under forward-looking macroeconomic scenarios. Traditional parametric approaches often exhibit numerical instability under extreme shocks, reducing the reliability of capital projections. This paper extends the Hybrid Gaussian Process Regression Historical Simulation (GPR-HS) framework of Vadrevu (2026) to forward-looking stress scenarios, demonstrating stability across three regimes: West Asia War, Climate Risk, and AI Bubble/Regulation. A key contribution is the Scenario-Averaged Covariance Stabilization (SACS) framework, which constructs stress covariance as a weighted aggregation of historical crisis regimes, providing stable and interpretable dependence structures. Stressed return paths are generated over a 252-day horizon using deterministic drift and stochastic residuals, while volatility is modeled via Gaussian Process Regression with Aggressive Noise Initialization (ANI). The framework exhibits consistent convergence across all assets and scenarios. SVaR ran
Progress in vision research has been slower downstream than upstream of primary visual cortex (V1). Traditional frameworks have largely overlooked a central constraint: only a tiny fraction of retinal input is recognized. Thus, to a first approximation, vision is better formulated as looking and seeing through a bottleneck. Looking, mainly by the peripheral visual field, selects visual information to enter this bottleneck, largely via gaze shifts that center selected contents at fovea. Seeing, mainly by the central visual field, recognizes this content. Converging evidence suggests that V1 initiates the bottleneck and contributes to looking by generating a bottom-up saliency map that guides saccades exogenously, and that top-down feedback along the visual pathway, targeting mainly the representation of the central visual field, refines seeing. Progress will accelerate through falsifiable theories that explicitly link behavior with neural substrates, and by experimental designs that avoid forced fixation and precisely track gaze.
We address the problem of looking into the water from the air, where we seek to remove image distortions caused by refractions at the water surface. Our approach is based on modeling the different water surface structures at various points in time, assuming the underlying image is constant. To this end, we propose a model that consists of two neural-field networks. The first network predicts the height of the water surface at each spatial position and time, and the second network predicts the image color at each position. Using both networks, we reconstruct the observed sequence of images and can therefore use unsupervised training. We show that using implicit neural representations with periodic activation functions (SIREN) leads to effective modeling of the surface height spatio-temporal signal and its derivative, as required for image reconstruction. Using both simulated and real data we show that our method outperforms the latest unsupervised image restoration approach. In addition, it provides an estimate of the water surface.
MLLMs have demonstrated significant visual understanding capabilities, yet their fine-grained visual perception in complex real-world scenarios, such as densely crowded public areas, remains limited. Inspired by the recent success of RL in both LLMs and MLLMs, in this paper, we explore how RL can enhance visual perception ability of MLLMs. Then we develop a novel RL-based framework, Deep Inspection and Perception with RL (DIP-R1) designed to enhance the visual perception capabilities of MLLMs, by comprehending complex scenes and looking through visual instances closely. DIP-R1 guides MLLMs through detailed inspection of visual scene via three simply designed rule-based reward modeling. First, we adopt a standard reasoning reward encouraging the model to include three-step reasoning process: 1) comprehending entire visual scene, 2) observing for looking through interested but ambiguous regions, and 3) decision-making for predicting answer. Second, a variance-guided looking reward is designed to encourage MLLM to examine uncertain regions during the observing process, guiding it to inspect ambiguous areas and mitigate perceptual uncertainty. This reward promotes variance-driven visua
This report presents our team's 'PCIE_LAM' solution for the Ego4D Looking At Me Challenge at CVPR2024. The main goal of the challenge is to accurately determine if a person in the scene is looking at the camera wearer, based on a video where the faces of social partners have been localized. Our proposed solution, InternLSTM, consists of an InternVL image encoder and a Bi-LSTM network. The InternVL extracts spatial features, while the Bi-LSTM extracts temporal features. However, this task is highly challenging due to the distance between the person in the scene and the camera movement, which results in significant blurring in the face image. To address the complexity of the task, we implemented a Gaze Smoothing filter to eliminate noise or spikes from the output. Our approach achieved the 1st position in the looking at me challenge with 0.81 mAP and 0.93 accuracy rate. Code is available at https://github.com/KanokphanL/Ego4D_LAM_InternLSTM
This chapter examines current developments in linguistic theory and methods, focusing on the increasing integration of computational, cognitive, and evolutionary perspectives. We highlight four major themes shaping contemporary linguistics: (1) the explicit testing of hypotheses about symbolic representation, such as efficiency, locality, and conceptual semantic grounding; (2) the impact of artificial neural networks on theoretical debates and linguistic analysis; (3) the importance of intersubjectivity in linguistic theory; and (4) the growth of evolutionary linguistics. By connecting linguistics with computer science, psychology, neuroscience, and biology, we provide a forward-looking perspective on the changing landscape of linguistic research.
Measuring how real images look is a complex task in artificial intelligence research. For example, an image of a boy with a vacuum cleaner in a desert violates common sense. We introduce a novel method, which we call Through the Looking Glass (TLG), to assess image common sense consistency using Large Vision-Language Models (LVLMs) and Transformer-based encoder. By leveraging LVLMs to extract atomic facts from these images, we obtain a mix of accurate facts. We proceed by fine-tuning a compact attention-pooling classifier over encoded atomic facts. Our TLG has achieved a new state-of-the-art performance on the WHOOPS! and WEIRD datasets while leveraging a compact fine-tuning component.
"Metaphorical maps" or "contact representations" are visual representations of vertex-weighted graphs that rely on the geographic map metaphor. The vertices are represented by countries, the weights by the areas of the countries, and the edges by contacts/ boundaries among them. The accuracy with which the weights are mapped to areas and the simplicity of the polygons representing the countries are the two classical optimization goals for metaphorical maps. Mchedlidze and Schnorr [Metaphoric Maps for Dynamic Vertex-weighted Graphs, EuroVis 2022] presented a force-based algorithm that creates metaphorical maps that balance between these two optimization goals. Their maps look visually simple, but the accuracy of the maps is far from optimal - the countries' areas can vary up to 30% compared to required. In this paper, we provide a multi-fold extension of the algorithm in [Metaphoric Maps for Dynamic Vertex-weighted Graphs, EuroVis 2022]. More specifically: 1. Towards improving accuracy: We introduce the notion of region stiffness and suggest a technique for varying the stiffness based on the current pressure of map regions. 2. Towards maintaining simplicity: We introduce a weight co
We consider a retailer running a switchback experiment for the price of a single product, with infinite supply. In each period, the seller chooses a price $p$ from a set of predefined prices that consist of a reference price and a few discounted price levels. The goal is to estimate the demand gradient at the reference price point, with the goal of adjusting the reference price to improve revenue after the experiment. In our model, in each period, a unit mass of buyers arrives on the market, with values distributed based on a time-varying process. Crucially, buyers are forward looking with a discounted utility and will choose to not purchase now if they expect to face a discounted price in the near future. We show that forward-looking demand introduces bias in naive estimators of the demand gradient, due to intertemporal interference. Furthermore, we prove that there is no estimator that uses data from price experiments with only two price points that can recover the correct demand gradient, even in the limit of an infinitely long experiment with an infinitesimal price discount. Moreover, we characterize the form of the bias of naive estimators. Finally, we show that with a simple
In daily life, subjects often face a social dilemma in two stages. In Stage 1, they recognize the social dilemma structure of the decision problem at hand (a tension between personal interest and collective interest); in Stage 2, they have to choose between gathering additional information to learn the exact payoffs corresponding to each of the two options or making a choice without looking at the payoffs. While previous theoretical research suggests that the mere act of considering one's strategic options in a social dilemma will be met with distrust, no experimental study has tested this hypothesis. What does "looking at payoffs" signal in observers? Do observers' beliefs actually match decision makers' intentions? Experiment 1 shows that the actual action of looking at payoffs signals selfish behavior, but it does not actually mean so. Experiments 2 and 3 show that, when the action of looking at payoffs is replaced by a self-report question asking the extent to which participants look at payoffs in their everyday lives, subjects in high looking mode are indeed more selfish than those in low looking mode, and this is correctly predicted by observers. These results support Rand an
With the fast development of Machine Translation (MT) systems, especially the new boost from Neural MT (NMT) models, the MT output quality has reached a new level of accuracy. However, many researchers criticised that the current popular evaluation metrics such as BLEU can not correctly distinguish the state-of-the-art NMT systems regarding quality differences. In this short paper, we describe the design and implementation of a linguistically motivated human-in-the-loop evaluation metric looking into idiomatic and terminological Multi-word Expressions (MWEs). MWEs have played a bottleneck in many Natural Language Processing (NLP) tasks including MT. MWEs can be used as one of the main factors to distinguish different MT systems by looking into their capabilities in recognising and translating MWEs in an accurate and meaning equivalent manner.
The aim of the present paper is to provide criteria for a central bank of how to choose among different monetary-policy rules when caring about a number of policy targets such as the output gap and expected inflation. Special attention is given to the question if policy instruments are predetermined or only forward looking. Using the new-Keynesian Phillips curve with a cost-push-shock policy-transmission mechanism, the forward-looking case implies an extreme lack of robustness and of credibility of stabilization policy. The backward-looking case is such that the simple-rule parameters can be the solution of Ramsey optimal policy under limited commitment. As a consequence, we suggest to model explicitly the rational behavior of the policy maker with Ramsey optimal policy, rather than to use simple rules with an ambiguous assumption leading to policy advice that is neither robust nor credible.
Bayesian inference can quantify uncertainty in the predictions of neural networks using posterior distributions for model parameters and network output. By looking at these posterior distributions, one can separate the origin of uncertainty into aleatoric and epistemic contributions. One goal of uncertainty quantification is to inform on prediction accuracy. Here we show that prediction accuracy depends on both epistemic and aleatoric uncertainty in an intricate fashion that cannot be understood in terms of marginalized uncertainty distributions alone. How the accuracy relates to epistemic and aleatoric uncertainties depends not only on the model architecture, but also on the properties of the dataset. We discuss the significance of these results for active learning and introduce a novel acquisition function that outperforms common uncertainty-based methods. To arrive at our results, we approximated the posteriors using deep ensembles, for fully-connected, convolutional and attention-based neural networks.
In Mathematics is common to make a mistake and therefore a false conclusion arises. In each case it is important to recognize the mistake in order to avoid a similar one in the future. Geometric figures provide decisive help in order to have a strict mathematical proof, but also can easily lead to wrong conclusions without a mathematical proof. In this paper, several incorrect conclusions drawn for plausible looking diagrams are presented, motivated by a well-known faulty model for measuring the length of a segment. Similar models that lead to a contradiction are developed and a model that leads to the correct result is derived. The presented models prove the usefulness of paradoxes and can be implemented in a classroom in order to point out to students the significance of a strict mathematical proof as well as the construction of a correct mathematical model. The geometric nature of the problems provides the opportunity to use a dynamic geometric software.
We seek to rigorously evaluate the benefit of using a few beams rather than a single beam for a low-cost obstacle avoidance sonar for small AUVs. For a small low-cost AUV, the complexity, cost, and volume required for a multi-beam forward looking sonar are prohibitive. In contrast, a single-beam system is relatively easy to integrate into a small AUV, but does not provide the performance of a multi-beam solution. To better understand this trade-off, we seek to rigorously quantify the improvement with respect to obstacle avoidance performance of adding just a few beams to a single-beam forward looking sonar relative to the performance of the single-beam system. Our work fundamentally supports the goal of using small low-cost AUV systems in cluttered and unstructured environments. Specifically, we investigate the benefit of incorporating a port and starboard beam to a single-beam sonar system for collision avoidance. A methodology for collision avoidance is developed to obtain a fair comparison between a single-beam and multi-beam system, explicitly incorporating the geometry of the beam patterns from forward-looking sonars with large beam angles, and simulated using a high-fidelity
October 1957, and the successful launch of Sputnik 1 into Earth orbit, marked the dawn of the Space Age. The first of the 'fellow travellers' - humanity's first artificial satellite - orbited for a mere three months before re-entering the Earth's atmosphere, though its mission paved the way for an era of exploration that has endured to the present day. For many, a world without satellites would be a difficult one to imagine. As a society, we have become reliant on them for a vast array of services and applications. With a divine view of large swathes of the Earth's surface, and the ability to relay signals around its curvature, satellites have enabled the fast transfer of data on a global scale, bypassing the challenges associated with ground-based broadcasting, long-distance wiring, and so on. Positioning, Navigation and Timing (PNT) satellites have revolutionised transportation by land, air, and sea, while weather satellites enable scientists to monitor and warn of large-scale phenomena as they develop in near real-time. Satellites have extended the frontiers of observation: looking outwards, astronomers are able to circumvent the Earth's atmosphere to look deeper into the cosmos
Policy makers, urban planners, architects, sociologists, and economists are interested in creating urban areas that are both lively and safe. But are the safety and liveliness of neighborhoods independent characteristics? Or are they just two sides of the same coin? In a world where people avoid unsafe looking places, neighborhoods that look unsafe will be less lively, and will fail to harness the natural surveillance of human activity. But in a world where the preference for safe looking neighborhoods is small, the connection between the perception of safety and liveliness will be either weak or nonexistent. In this paper we explore the connection between the levels of activity and the perception of safety of neighborhoods in two major Italian cities by combining mobile phone data (as a proxy for activity or liveliness) with scores of perceived safety estimated using a Convolutional Neural Network trained on a dataset of Google Street View images scored using a crowdsourced visual perception survey. We find that: (i) safer looking neighborhoods are more active than what is expected from their population density, employee density, and distance to the city centre; and (ii) that the
Filtering has had a profound impact as a device of perceiving information and deriving agent expectations in dynamic economic models. For an abstract economic system, this paper shows that the foundation of applying the filtering method corresponds to the existence of a conditional expectation as an equilibrium process. Agent-based rational behavior of looking backward and looking forward is generalized to a conditional expectation process where the economic system is approximated by a class of models, which can be represented and estimated without information loss. The proposed framework elucidates the range of applications of a general filtering device and is not limited to a particular model class such as rational expectations.
On many learning platforms, the optimization criteria guiding model training reflect the priorities of the designer rather than those of the individuals they affect. Consequently, users may act strategically to obtain more favorable outcomes. While past work has studied strategic user behavior on learning platforms, the focus has largely been on strategic responses to a deployed model, without considering the behavior of other users. In contrast, look-ahead reasoning takes into account that user actions are coupled, and -- at scale -- impact future predictions. Within this framework, we first formalize level-k thinking, a concept from behavioral economics, where users aim to outsmart their peers by looking one step ahead. We show that, while convergence to an equilibrium is accelerated, the equilibrium remains the same, providing no benefit of higher-level reasoning for individuals in the long run. Then, we focus on collective reasoning, where users take coordinated actions by optimizing through their joint impact on the model. By contrasting collective with selfish behavior, we characterize the benefits and limits of coordination; a new notion of alignment between the learner's an