Regulatory stress testing frameworks, including the Comprehensive Capital Analysis and Review (CCAR) and the Internal Capital Adequacy Assessment Process (ICAAP), require robust Stressed Value-at-Risk (SVaR) estimation under forward-looking macroeconomic scenarios. Traditional parametric approaches often exhibit numerical instability under extreme shocks, reducing the reliability of capital projections. This paper extends the Hybrid Gaussian Process Regression Historical Simulation (GPR-HS) framework of Vadrevu (2026) to forward-looking stress scenarios, demonstrating stability across three regimes: West Asia War, Climate Risk, and AI Bubble/Regulation. A key contribution is the Scenario-Averaged Covariance Stabilization (SACS) framework, which constructs stress covariance as a weighted aggregation of historical crisis regimes, providing stable and interpretable dependence structures. Stressed return paths are generated over a 252-day horizon using deterministic drift and stochastic residuals, while volatility is modeled via Gaussian Process Regression with Aggressive Noise Initialization (ANI). The framework exhibits consistent convergence across all assets and scenarios. SVaR ran
Our brain recognizes only a tiny fraction of sensory input, due to an information processing bottleneck. This blinds us to most visual inputs. Since we are blind to this blindness, only a recent framework highlights this bottleneck by formulating vision as mainly looking and seeing. Looking selects a tiny fraction of visual information for progression through the bottleneck, mainly by shifting gaze to center an attentional spotlight. Seeing decodes, i.e., recognizes, objects within the selected information. Since looking often occurs before seeing and evokes limited awareness, humans have the impression of seeing whole scenes clearly. According to the new framework, the bottleneck starts from the output of the primary visual cortex (V1) to downstream brain areas. This is motivated by the evidence-backed V1 Saliency Hypothesis (V1SH) that V1 creates a saliency map of the visual field to guide looking. Massive visual information loss downstream from V1 makes seeing vulnerable to ambiguity and illusions (errors). To overcome this, feedback from downstream to upstream areas such as V1 queries for additional relevant information. An integral part of this framework is the central-periphe
Progress in vision research has been slower downstream than upstream of primary visual cortex (V1). Traditional frameworks have largely overlooked a central constraint: only a tiny fraction of retinal input is recognized. Thus, to a first approximation, vision is better formulated as looking and seeing through a bottleneck. Looking, mainly by the peripheral visual field, selects visual information to enter this bottleneck, largely via gaze shifts that center selected contents at fovea. Seeing, mainly by the central visual field, recognizes this content. Converging evidence suggests that V1 initiates the bottleneck and contributes to looking by generating a bottom-up saliency map that guides saccades exogenously, and that top-down feedback along the visual pathway, targeting mainly the representation of the central visual field, refines seeing. Progress will accelerate through falsifiable theories that explicitly link behavior with neural substrates, and by experimental designs that avoid forced fixation and precisely track gaze.
We address the problem of looking into the water from the air, where we seek to remove image distortions caused by refractions at the water surface. Our approach is based on modeling the different water surface structures at various points in time, assuming the underlying image is constant. To this end, we propose a model that consists of two neural-field networks. The first network predicts the height of the water surface at each spatial position and time, and the second network predicts the image color at each position. Using both networks, we reconstruct the observed sequence of images and can therefore use unsupervised training. We show that using implicit neural representations with periodic activation functions (SIREN) leads to effective modeling of the surface height spatio-temporal signal and its derivative, as required for image reconstruction. Using both simulated and real data we show that our method outperforms the latest unsupervised image restoration approach. In addition, it provides an estimate of the water surface.
MLLMs have demonstrated significant visual understanding capabilities, yet their fine-grained visual perception in complex real-world scenarios, such as densely crowded public areas, remains limited. Inspired by the recent success of RL in both LLMs and MLLMs, in this paper, we explore how RL can enhance visual perception ability of MLLMs. Then we develop a novel RL-based framework, Deep Inspection and Perception with RL (DIP-R1) designed to enhance the visual perception capabilities of MLLMs, by comprehending complex scenes and looking through visual instances closely. DIP-R1 guides MLLMs through detailed inspection of visual scene via three simply designed rule-based reward modeling. First, we adopt a standard reasoning reward encouraging the model to include three-step reasoning process: 1) comprehending entire visual scene, 2) observing for looking through interested but ambiguous regions, and 3) decision-making for predicting answer. Second, a variance-guided looking reward is designed to encourage MLLM to examine uncertain regions during the observing process, guiding it to inspect ambiguous areas and mitigate perceptual uncertainty. This reward promotes variance-driven visua
This chapter examines current developments in linguistic theory and methods, focusing on the increasing integration of computational, cognitive, and evolutionary perspectives. We highlight four major themes shaping contemporary linguistics: (1) the explicit testing of hypotheses about symbolic representation, such as efficiency, locality, and conceptual semantic grounding; (2) the impact of artificial neural networks on theoretical debates and linguistic analysis; (3) the importance of intersubjectivity in linguistic theory; and (4) the growth of evolutionary linguistics. By connecting linguistics with computer science, psychology, neuroscience, and biology, we provide a forward-looking perspective on the changing landscape of linguistic research.
Measuring how real images look is a complex task in artificial intelligence research. For example, an image of a boy with a vacuum cleaner in a desert violates common sense. We introduce a novel method, which we call Through the Looking Glass (TLG), to assess image common sense consistency using Large Vision-Language Models (LVLMs) and Transformer-based encoder. By leveraging LVLMs to extract atomic facts from these images, we obtain a mix of accurate facts. We proceed by fine-tuning a compact attention-pooling classifier over encoded atomic facts. Our TLG has achieved a new state-of-the-art performance on the WHOOPS! and WEIRD datasets while leveraging a compact fine-tuning component.
"Metaphorical maps" or "contact representations" are visual representations of vertex-weighted graphs that rely on the geographic map metaphor. The vertices are represented by countries, the weights by the areas of the countries, and the edges by contacts/ boundaries among them. The accuracy with which the weights are mapped to areas and the simplicity of the polygons representing the countries are the two classical optimization goals for metaphorical maps. Mchedlidze and Schnorr [Metaphoric Maps for Dynamic Vertex-weighted Graphs, EuroVis 2022] presented a force-based algorithm that creates metaphorical maps that balance between these two optimization goals. Their maps look visually simple, but the accuracy of the maps is far from optimal - the countries' areas can vary up to 30% compared to required. In this paper, we provide a multi-fold extension of the algorithm in [Metaphoric Maps for Dynamic Vertex-weighted Graphs, EuroVis 2022]. More specifically: 1. Towards improving accuracy: We introduce the notion of region stiffness and suggest a technique for varying the stiffness based on the current pressure of map regions. 2. Towards maintaining simplicity: We introduce a weight co
We consider a retailer running a switchback experiment for the price of a single product, with infinite supply. In each period, the seller chooses a price $p$ from a set of predefined prices that consist of a reference price and a few discounted price levels. The goal is to estimate the demand gradient at the reference price point, with the goal of adjusting the reference price to improve revenue after the experiment. In our model, in each period, a unit mass of buyers arrives on the market, with values distributed based on a time-varying process. Crucially, buyers are forward looking with a discounted utility and will choose to not purchase now if they expect to face a discounted price in the near future. We show that forward-looking demand introduces bias in naive estimators of the demand gradient, due to intertemporal interference. Furthermore, we prove that there is no estimator that uses data from price experiments with only two price points that can recover the correct demand gradient, even in the limit of an infinitely long experiment with an infinitesimal price discount. Moreover, we characterize the form of the bias of naive estimators. Finally, we show that with a simple
This report presents our team's 'PCIE_LAM' solution for the Ego4D Looking At Me Challenge at CVPR2024. The main goal of the challenge is to accurately determine if a person in the scene is looking at the camera wearer, based on a video where the faces of social partners have been localized. Our proposed solution, InternLSTM, consists of an InternVL image encoder and a Bi-LSTM network. The InternVL extracts spatial features, while the Bi-LSTM extracts temporal features. However, this task is highly challenging due to the distance between the person in the scene and the camera movement, which results in significant blurring in the face image. To address the complexity of the task, we implemented a Gaze Smoothing filter to eliminate noise or spikes from the output. Our approach achieved the 1st position in the looking at me challenge with 0.81 mAP and 0.93 accuracy rate. Code is available at https://github.com/KanokphanL/Ego4D_LAM_InternLSTM
We seek to rigorously evaluate the benefit of using a few beams rather than a single beam for a low-cost obstacle avoidance sonar for small AUVs. For a small low-cost AUV, the complexity, cost, and volume required for a multi-beam forward looking sonar are prohibitive. In contrast, a single-beam system is relatively easy to integrate into a small AUV, but does not provide the performance of a multi-beam solution. To better understand this trade-off, we seek to rigorously quantify the improvement with respect to obstacle avoidance performance of adding just a few beams to a single-beam forward looking sonar relative to the performance of the single-beam system. Our work fundamentally supports the goal of using small low-cost AUV systems in cluttered and unstructured environments. Specifically, we investigate the benefit of incorporating a port and starboard beam to a single-beam sonar system for collision avoidance. A methodology for collision avoidance is developed to obtain a fair comparison between a single-beam and multi-beam system, explicitly incorporating the geometry of the beam patterns from forward-looking sonars with large beam angles, and simulated using a high-fidelity
In daily life, subjects often face a social dilemma in two stages. In Stage 1, they recognize the social dilemma structure of the decision problem at hand (a tension between personal interest and collective interest); in Stage 2, they have to choose between gathering additional information to learn the exact payoffs corresponding to each of the two options or making a choice without looking at the payoffs. While previous theoretical research suggests that the mere act of considering one's strategic options in a social dilemma will be met with distrust, no experimental study has tested this hypothesis. What does "looking at payoffs" signal in observers? Do observers' beliefs actually match decision makers' intentions? Experiment 1 shows that the actual action of looking at payoffs signals selfish behavior, but it does not actually mean so. Experiments 2 and 3 show that, when the action of looking at payoffs is replaced by a self-report question asking the extent to which participants look at payoffs in their everyday lives, subjects in high looking mode are indeed more selfish than those in low looking mode, and this is correctly predicted by observers. These results support Rand an
In Mathematics is common to make a mistake and therefore a false conclusion arises. In each case it is important to recognize the mistake in order to avoid a similar one in the future. Geometric figures provide decisive help in order to have a strict mathematical proof, but also can easily lead to wrong conclusions without a mathematical proof. In this paper, several incorrect conclusions drawn for plausible looking diagrams are presented, motivated by a well-known faulty model for measuring the length of a segment. Similar models that lead to a contradiction are developed and a model that leads to the correct result is derived. The presented models prove the usefulness of paradoxes and can be implemented in a classroom in order to point out to students the significance of a strict mathematical proof as well as the construction of a correct mathematical model. The geometric nature of the problems provides the opportunity to use a dynamic geometric software.
In this report, we present the transferring pretrained video mask autoencoders(VideoMAE) to egocentric tasks for Ego4d Looking at me Challenge. VideoMAE is the data-efficient pretraining model for self-supervised video pre-training and can easily transfer to downstream tasks. We show that the representation transferred from VideoMAE has good Spatio-temporal modeling and the ability to capture small actions. We only need to use egocentric data to train 10 epochs based on VideoMAE which pretrained by the ordinary videos acquired from a third person's view, and we can get better results than the baseline on Ego4d Looking at me Challenge.
With the fast development of Machine Translation (MT) systems, especially the new boost from Neural MT (NMT) models, the MT output quality has reached a new level of accuracy. However, many researchers criticised that the current popular evaluation metrics such as BLEU can not correctly distinguish the state-of-the-art NMT systems regarding quality differences. In this short paper, we describe the design and implementation of a linguistically motivated human-in-the-loop evaluation metric looking into idiomatic and terminological Multi-word Expressions (MWEs). MWEs have played a bottleneck in many Natural Language Processing (NLP) tasks including MT. MWEs can be used as one of the main factors to distinguish different MT systems by looking into their capabilities in recognising and translating MWEs in an accurate and meaning equivalent manner.
Realistic image synthesis is to generate an image that is perceptually indistinguishable from an actual image. Generating realistic looking images with large variations (e.g., large spatial deformations and large pose change), however, is very challenging. Handing large variations as well as preserving appearance needs to be taken into account in the realistic looking image generation. In this paper, we propose a novel realistic looking image synthesis method, especially in large change demands. To do that, we devise generative guiding blocks. The proposed generative guiding block includes realistic appearance preserving discriminator and naturalistic variation transforming discriminator. By taking the proposed generative guiding blocks into generative model, the latent features at the layer of generative model are enhanced to synthesize both realistic looking- and target variation- image. With qualitative and quantitative evaluation in experiments, we demonstrated the effectiveness of the proposed generative guiding blocks, compared to the state-of-the-arts.
Bayesian inference can quantify uncertainty in the predictions of neural networks using posterior distributions for model parameters and network output. By looking at these posterior distributions, one can separate the origin of uncertainty into aleatoric and epistemic contributions. One goal of uncertainty quantification is to inform on prediction accuracy. Here we show that prediction accuracy depends on both epistemic and aleatoric uncertainty in an intricate fashion that cannot be understood in terms of marginalized uncertainty distributions alone. How the accuracy relates to epistemic and aleatoric uncertainties depends not only on the model architecture, but also on the properties of the dataset. We discuss the significance of these results for active learning and introduce a novel acquisition function that outperforms common uncertainty-based methods. To arrive at our results, we approximated the posteriors using deep ensembles, for fully-connected, convolutional and attention-based neural networks.
October 1957, and the successful launch of Sputnik 1 into Earth orbit, marked the dawn of the Space Age. The first of the 'fellow travellers' - humanity's first artificial satellite - orbited for a mere three months before re-entering the Earth's atmosphere, though its mission paved the way for an era of exploration that has endured to the present day. For many, a world without satellites would be a difficult one to imagine. As a society, we have become reliant on them for a vast array of services and applications. With a divine view of large swathes of the Earth's surface, and the ability to relay signals around its curvature, satellites have enabled the fast transfer of data on a global scale, bypassing the challenges associated with ground-based broadcasting, long-distance wiring, and so on. Positioning, Navigation and Timing (PNT) satellites have revolutionised transportation by land, air, and sea, while weather satellites enable scientists to monitor and warn of large-scale phenomena as they develop in near real-time. Satellites have extended the frontiers of observation: looking outwards, astronomers are able to circumvent the Earth's atmosphere to look deeper into the cosmos
We propose Multi-view Pyramid Transformer (MVP), a scalable multi-view transformer architecture that directly reconstructs large 3D scenes from tens to hundreds of images in a single forward pass. Drawing on the idea of ``looking broader to see the whole, looking finer to see the details," MVP is built on two core design principles: 1) a local-to-global inter-view hierarchy that gradually broadens the model's perspective from local views to groups and ultimately the full scene, and 2) a fine-to-coarse intra-view hierarchy that starts from detailed spatial representations and progressively aggregates them into compact, information-dense tokens. This dual hierarchy achieves both computational efficiency and representational richness, enabling fast reconstruction of large and complex scenes. We validate MVP on diverse datasets and show that, when coupled with 3D Gaussian Splatting as the underlying 3D representation, it achieves state-of-the-art generalizable reconstruction quality while maintaining high efficiency and scalability across a wide range of view configurations.
On many learning platforms, the optimization criteria guiding model training reflect the priorities of the designer rather than those of the individuals they affect. Consequently, users may act strategically to obtain more favorable outcomes. While past work has studied strategic user behavior on learning platforms, the focus has largely been on strategic responses to a deployed model, without considering the behavior of other users. In contrast, look-ahead reasoning takes into account that user actions are coupled, and -- at scale -- impact future predictions. Within this framework, we first formalize level-k thinking, a concept from behavioral economics, where users aim to outsmart their peers by looking one step ahead. We show that, while convergence to an equilibrium is accelerated, the equilibrium remains the same, providing no benefit of higher-level reasoning for individuals in the long run. Then, we focus on collective reasoning, where users take coordinated actions by optimizing through their joint impact on the model. By contrasting collective with selfish behavior, we characterize the benefits and limits of coordination; a new notion of alignment between the learner's an