Human communication is inherently multimodal, where language is often accompanied by non-verbal cues such as gestures to convey intentions. However, current Vision-Language-Action (VLA) models treat robotic manipulation as a pure text-driven task, overlooking the important role of gestures in Human-Robot Interaction (HRI). This often leads to inaccurate intent grounding and unreliable manipulation when language instructions are ambiguous or underspecified. To address this challenge, we propose GIVE (Gesture Intent via Visual-Semantic Enhancement), an effective approach that enhances pre-trained VLA models with human gesture understanding without architectural modifications. Specifically, GIVE incorporates gesture information through two complementary pathways: a visual pathway that overlays hand skeletons and fingertip rays onto robot observations for explicit object grounding, and a semantic pathway that generates high-level descriptions of human gestures and task instructions for robust intent grounding. By jointly leveraging visual and semantic guidance, GIVE enables VLA policies to better associate gestures with manipulation behaviors and adapt to dynamic interaction intents. I
Existing approaches based on context prompting or reinforcement learning (RL) to improve the reasoning capacities of large language models (LLMs) depend on the LLMs' internal knowledge to produce reliable Chain-Of-Thought (CoT). However, no matter the size of LLMs, certain problems cannot be resolved in a single forward pass. Meanwhile, agent-based reasoning systems require access to a comprehensive nonparametric knowledge base, which is often costly or not feasible for use in scientific and niche domains. We present Graph Inspired Veracity Extrapolation (GIVE), a novel reasoning method that merges parametric and non-parametric memories to improve accurate reasoning with minimal external input. GIVE guides the LLM agent to select the most pertinent expert data (observe), engage in query-specific divergent thinking (reflect), and then synthesize this information to produce the final output (speak). Extensive experiments demonstrated the following benefits of our framework: (1) GIVE boosts the performance of LLMs across various sizes. (2) In some scenarios, GIVE allows smaller LLMs to surpass larger, more sophisticated ones in scientific tasks (GPT3.5T + GIVE > GPT4). (3) GIVE is
Integer Quadratic Programming (IQP), $\min\{x^T Q x + c^T x : Ax \le b,\, x\in\Z^n\}$, is a fundamental problem in combinatorial optimization. While the convex and concave special cases admit polynomial-time algorithms for fixed~$n$, the general indefinite case is considerably harder: it was only recently shown to lie in NP, and the FPT algorithm, due to Lokshtanov, establishes fixed-parameter tractability parameterized by $n$ and the largest coefficient~$L$ without giving an explicit running time. We give the first single-exponential algorithm for IQP, solving it in time $ \bigl(n\,L^n_A\,Δ(A)\,L_Q\bigr)^{O(n)}\cdot\mathrm{poly}(\varphi), $ which is $(nL)^{O(n^2)}\cdot\mathrm{poly}(\varphi)$ in general using the same parameterization. We achieve improvements for structured cases like total unimodularity and further state explicit complexity results for a number of FPT algorithms and optimization problems. The single-exponential bound is achieved via curvature batching: we classify kernel directions by the sign of their quadratic curvature and observe that when no negative-curvature direction exists, all gradient constraints can be imposed simultaneously in a single batch. This rep
This paper takes a critical look at the recommendations OSCE/ODIHR has given for the Estonian Internet voting over the 20 years it has been running. We present examples of recommendations that can not be fulfilled at all, but also examples where fulfilling a recommendation requires a non-trivial trade-off, potentially weakening the system in some other respect. In such cases OSCE/ODIHR should take an explicit position which trade-off it recommends. We also look at the development of the recommendation to introduce end-to-end verifiability. In this case we expect OSCE/ODIHR to define what it exactly means by this property, as well as to give explicit criteria to determine whether and to which extent end-to-end verifiability has been achieved.
Artificial intelligence (AI) developers are rhetorically flirting with the idea that AI systems might have interests or moral rights. While there has been a large volume of research on whether AI deserves rights, there has been less exploration of what AI rights would mean in practice. This paper explores the institutional dimension of AI rights: what it would take to recognize moral or legal rights for AIs, and the attendant opportunities and dangers. Unlike all other nonhuman entities to which humanity has extended rights, AI systems are in principle capable of acquiring and wielding institutional power without human aid and mediation. AIs with rights would be able to legitimately, and AIs with power able to unpreventably, abridge human interests. Accordingly, giving rights even to rather dumb AI systems would entail binding the fate of humanity to potentially unpredictable nonhumans. Accordingly, I defend the rather grandiose claim that to empower AI to claim or to exercise inherent rights would be a world-historical gamble with human self-determination, which no individual researcher, firm, state, or even international organization has the moral right to authorize.
We show that quantum oracles provide an advantage over classical oracles for answering classical counterfactual questions in causal models, or equivalently, for identifying unknown causal parameters such as distributions over functional dependences. In structural causal models with discrete classical variables, observational data and even ideal interventions generally fail to answer all counterfactual questions, since different causal parameters can reproduce the same observational and interventional data while disagreeing on counterfactuals. Using a simple binary example, we demonstrate that if the classical variables of interest are encoded in quantum systems and the causal dependence among them is encoded in a quantum oracle, coherently querying the oracle enables the identification of all causal parameters -- hence all classical counterfactuals. We generalize this to arbitrary finite cardinalities and prove that coherent probing 1) allows the identification of all two-way joint counterfactuals p(Y_x=y, Y_{x'}=y'), which is not possible with any number of queries to a classical oracle, and 2) provides tighter bounds on higher-order multi-way counterfactuals than with a classical
We prove that weakly elliptic damping gives sharp energy decay for the abstract damped wave semigroup, where the damping is not in the functional calculus. In this case, there is no overdamping. We show applications in linearised water waves and Kelvin--Voigt damping.
The driver's willingness to give (WTG) control in conditionally automated driving is assessed in a virtual reality based driving-rig, through their choice to give away driving control and through the extent to which automated driving is adopted in a mixed-traffic environment. Within- and across-class unobserved heterogeneity and locus of control variations are taken into account. The choice of giving away control is modelled using the mixed logit (MIXL) and mixed latent class (LCML) model. The significant latent segments of the locus of control are developed into internalizers and externalizers by the latent class model (LCM) based on the taste heterogeneity identified from the MIXL model. Results suggest that drivers choose to "giveAway" control of the vehicle when greater concentration/attentiveness is required (e.g., in the nighttime) or when they are interested in performing a non-driving-related task (NDRT). In addition, it is observed that internalizers demonstrate more heterogeneity compared to externalizers in terms of WTG.
Multimodal Large Language Models have advanced AI in applications like text-to-video generation and visual question answering. These models rely on visual encoders to convert non-text data into vectors, but current encoders either lack semantic alignment or overlook non-salient objects. We propose the Guiding Visual Encoder to Perceive Overlooked Information (GiVE) approach. GiVE enhances visual representation with an Attention-Guided Adapter (AG-Adapter) module and an Object-focused Visual Semantic Learning module. These incorporate three novel loss terms: Object-focused Image-Text Contrast (OITC) loss, Object-focused Image-Image Contrast (OIIC) loss, and Object-focused Image Discrimination (OID) loss, improving object consideration, retrieval accuracy, and comprehensiveness. Our contributions include dynamic visual focus adjustment, novel loss functions to enhance object retrieval, and the Multi-Object Instruction (MOInst) dataset. Experiments show our approach achieves state-of-the-art performance.
Given an adjoint pair of functors $F,G$, the composite $GF$ naturally gets the structure of a monad. The same monad may arise from many such adjoint pairs of functors, however. Can one describe all of the adjunctions giving rise to a given monad? In this paper we single out a class of adjunctions with especially good properties, and we develop methods for computing all such adjunctions, up to natural equivalence, which give rise to a given monad. To demonstrate these methods, we explicitly compute the finitary homological presentations of the free $A$-module monad on the category of sets, for $A$ a Dedekind domain. We also prove a criterion, reminiscent of Beck's monadicity theorem, for when there is essentially (in a precise sense) only a single adjunction that gives rise to a given monad.
We consider item allocation to individual agents who have additive valuations, in settings in which there are protected groups, and the allocation needs to give each protected group its "fair" share of the total welfare. Informally, within each protected group we consider the total welfare that the allocation gives the members of the group, and compare it to the maximum possible welfare that an allocation can give to the group members. An allocation is fair towards the group if the ratio between these two values is no worse then the relative size of the group. For divisible items, our formal definition of fairness is based on the proportional share, whereas for indivisible items, it is based on the anyprice share. We present examples in which there are no fair allocations, and even not allocations that approximate the fairness requirement within a constant multiplicative factor. We then attempt to identify sufficient conditions for fair or approximately fair allocations to exist. For example, for indivisible items, when agents have identical valuations and the family of protected groups is laminar, we show that if the items are chores, then an allocation that satisfies every fairne
In this paper, we mainly study linear one-dimensional and two-dimensional elementary cellular automata that generate symmetrical spatio-temporal patterns. For spatio-temporal patterns of cellular automata from the single site seed, we normalize the number of nonzero states of the patterns, take the limits, and give one-variable functions for the limit sets. We can obtain a one-variable function for each limit set and show that the resulting functions are singular functions, which are non-constant, are continuous everywhere, and have a zero derivative almost everywhere. We show that for Rule 90, a one-dimensional elementary cellular automaton (CA), and a two-dimensional elementary CA, the resulting functions are Salem's singular functions. We also discuss two nonlinear elementary CAs, Rule 22, and Rule 126. Although their spatio-temporal patterns are different from that of Rule 90, their resulting functions from the number of nonzero states equal the function of Rule 90.
We give a simple, direct proof of the easy fact about the Weierstrass Representation, namely, that it always gives a minimal surface. Most presentations include the much harder converse that every simply connected minimal surface is given by the Weierstrass Representation.
We give a rigorous analysis of the statistical behavior of gradients in a randomly initialized fully connected network N with ReLU activations. Our results show that the empirical variance of the squares of the entries in the input-output Jacobian of N is exponential in a simple architecture-dependent constant beta, given by the sum of the reciprocals of the hidden layer widths. When beta is large, the gradients computed by N at initialization vary wildly. Our approach complements the mean field theory analysis of random networks. From this point of view, we rigorously compute finite width corrections to the statistics of gradients at the edge of chaos.
Both explicit analysis and FEM numerical simulation are used to analyze the field distribution of a line current in the so-called Maxwell's fish eye lens [bounded with a perfectly electrical conductor (PEC) boundary]. We show that such a 2D Maxwell's fish eye lens cannot give perfect imaging due to the fact that high order modes of the object field can hardly reach the image point in Maxwell's fish eye lens. If only zeroth order mode is excited, a good image of a sharp object may be achieved in some cases, however, its spot-size is larger than the spot size of the initial object field. The image resolution is determined by the field spot size of the image corresponding to the zeroth order component of the object field. Our explicit analysis consists very well with the FEM results for a fish eye lens. Time-domain simulation is also given to verify our conclusion. Multi-point images for a single object point are also demonstrated.
Distributionally Robust Supervised Learning (DRSL) is necessary for building reliable machine learning systems. When machine learning is deployed in the real world, its performance can be significantly degraded because test data may follow a different distribution from training data. DRSL with f-divergences explicitly considers the worst-case distribution shift by minimizing the adversarially reweighted training loss. In this paper, we analyze this DRSL, focusing on the classification scenario. Since the DRSL is explicitly formulated for a distribution shift scenario, we naturally expect it to give a robust classifier that can aggressively handle shifted distributions. However, surprisingly, we prove that the DRSL just ends up giving a classifier that exactly fits the given training distribution, which is too pessimistic. This pessimism comes from two sources: the particular losses used in classification and the fact that the variety of distributions to which the DRSL tries to be robust is too wide. Motivated by our analysis, we propose simple DRSL that overcomes this pessimism and empirically demonstrate its effectiveness.
It is commonly claimed that only Hamiltonians with a spectrum unbounded both above and below can give purely exponential decay. Because such Hamiltonians have no ground state, they are considered unphysical. Here we show that Hamiltonians which are bounded below can give purely exponential decay. This is possible when, instead of looking at the global survival probability, one considers a subsystem only. We conclude that purely exponential decay might not be as unphysical as previously thought.
An expert tells an advisee whether to take an action that may be good or bad. He may provide a condition under which to take the action. This condition predicts whether the action is good if and only if the expert is competent. Providing the condition exposes the expert to reputational risk by allowing the advisee to learn about his competence. He trades off the accuracy benefit and reputational risk induced by providing the condition. He prefers not to provide it -- i.e., to give "simple advice" -- when his payoff is sufficiently concave in the posterior belief about his competence.
We present a new approach to construction of protocols which are proof against communication errors. The construction is based on a generalization of the well known Ulam's game. We show equivalence between winning strategies in this game and robust protocols for multi-party computation. We do not give any complete theory. We want rather to describe a new fresh idea. We use a tree code defined by Schulman. The tree code is the most important part of the interactive version of Shannon's Coding Theorem proved by Schulman. He uses probabilistic argument for the existence of a tree code without giving any effective construction. We show another proof yielding a randomized construction which in contrary to his proof almost surely gives a good code. Moreover our construction uses much smaller alphabet.
[This is the unpublished supplemental information from 1989 to the paper: J.M. Deutsch, "Quantum statistical mechanics in a closed system." Phys. Rev. A, 43(4), 2046 (1991).] A closed quantum mechanical system does not necessarily give time averages in accordance with the microcanonical distribution. This question is investigated if the number of degrees of freedom N is large. For systems where the different degrees of freedom are uncoupled, experimental situations are discussed that show a violation of the usual statistical mechanical rules. It is shown that by applying a finite but very small perturbation to such systems, the results of quantum statistical mechanics can indeed be recovered. The form of the perturbation is that of a banded random matrix, which has been used previously to describe strongly chaotic systems in the semiclassical limit. The properties of energy eigenfunctions for this perturbed system are also discussed, and deviations from the microcanonical result are shown to become exponentially small in the limit of large N.