共找到 20 条结果
The increasing popularity of egocentric cameras has generated growing interest in studying multi-camera interactions in shared environments. Although large-scale datasets such as Ego4D and Ego-Exo4D have propelled egocentric vision research, interactions between multiple camera wearers remain underexplored-a key gap for applications like immersive learning and collaborative robotics. To bridge this, we present TF2025, an expanded dataset with synchronized first- and third-person views. In addition, we introduce a sequence-based method to identify first-person wearers in third-person footage, combining motion cues and person re-identification.
One of the typical purposes of using lower-limb exoskeleton robots is to provide assistance to the wearer by supporting their weight and augmenting their physical capabilities according to a given task and human motion intentions. The generalizability of robots across different wearers in multiple tasks is important to ensure that the robot can provide correct and effective assistance in actual implementation. However, most lower-limb exoskeleton robots exhibit only limited generalizability. Therefore, this paper proposes a human-in-the-loop learning and adaptation framework for exoskeleton robots to improve their performance in various tasks and for different wearers. To suit different wearers, an individualized walking trajectory is generated online using dynamic movement primitives and Bayes optimization. To accommodate various tasks, a task translator is constructed using a neural network to generalize a trajectory to more complex scenarios. These generalization techniques are integrated into a unified variable impedance model, which regulates the exoskeleton to provide assistance while ensuring safety. In addition, an anomaly detection network is developed to quantitatively ev
Explosive Ordnance Disposal (EOD) suits are widely used to protect human operators to execute emergency tasks such as bomb disposal and neutralization. Current suit designs still need to be improved in terms of wearer comfort, which can be assessed based on the interaction forces at the human-suit contact regions. This paper introduces a simulation-based modeling framework that computes the interaction loads at the human-suit interface based on a wearer's kinematic movement data. The proposed modeling framework consists of three primary components: a) inertial and geometric modeling of the EOD suit, b) state estimation of the wearer's in-suit movement, and c) inverse dynamics analysis to calculate the human-suit interface forces based on the simulated human-suit model and the estimated human movement data. This simulation-based modeling method could be used to complement experimental testing for improving the time and cost efficiency of EOD suit evaluation. The accuracy of the simulated interface load was experimentally benchmarked during three different human tasks (each with three trials), by comparing the predicted interface forces with that measured by commercial pressure senso
Camera glasses create fundamental privacy tensions between wearers seeking recording functionality and bystanders concerned about unauthorized surveillance. We present a systematic multi-stakeholder evaluation of privacy mechanisms through surveys (N=525) and paired interviews (N=20) in China. Study 1 quantifies expectation-willingness gaps: bystanders consistently demand stronger information transparency and protective measures than wearers will provide, with disparities intensifying in sensitive contexts where 65-90% of bystanders would take defensive action. Study 2 evaluates twelve privacy-enhancing technologies, revealing four fundamental trade-offs that undermine current approaches: visibility versus disruption, empowerment versus burden, protection versus agency, and accountability versus exposure. These gaps reflect structural incompatibilities rather than inadequate goodwill, with context emerging as the primary determinant of privacy acceptability. We propose context-adaptive pathways that dynamically adjust protection strategies: minimal-friction visibility in public spaces, structured negotiation in semi-public environments, and automatic protection in sensitive context
Accurately forecasting human trajectories from an egocentric perspective plays a central role in applications such as humanoid robotics, wearable sensing systems, and assistive navigation. However, progress in this direction remains limited due to the scarcity of egocentric trajectory datasets collected in real-world environments. Addressing this need, we introduce EgoTraj, an egocentric multimodal open dataset recorded using Meta Quest Pro (MQPro). EgoTraj contains 75 sequences of human navigation collected from multiple MQPro wearers in real-world urban environments. Each recording provides synchronized RGB video along with ground-truth data, including continuous time-synchronized 6-degree-of-freedom head poses, per-frame 3D eye gaze vectors, scene annotations. To the best of our knowledge, EgoTraj differs from typical egocentric trajectory datasets by capturing long-horizon, self-directed navigation across diverse urban routes with broad participant diversity. To demonstrate the potential of the dataset, we benchmark several state-of-the-art methods for egocentric trajectory prediction and conduct ablation studies to analyze the contributions of gaze, scene, and motion cues. The
Advances in wearable robotics challenge the traditional definition of human motor systems, as wearable robots redefine body structure, movement capability, and perception of their own bodies. We measured gait performance and perceived body images via Selected Coefficient of Perceived Motion, SCoMo, after each training session. Based on human motor learning theory extended to wearer-robot systems, we hypothesized that learning the perceived body image when walking with a robotic leg co-evolves with the actual gait improvement and becomes more certain and more accurate to the actual motion. Our result confirmed that motor learning improved both physical and perceived gait pattern towards normal, indicating that via practice the wearers incorporated the robotic leg into their sensorimotor systems to enable wearer-robot movement coordination. However, a persistent discrepancy between perceived and actual motion remained, likely due to the absence of direct sensation and control of the prosthesis from wearers. Additionally, the perceptual overestimation at the later training sessions might limit further motor improvement. These findings suggest that enhancing the human sense of wearable
Active noise control (ANC) has become popular for reducing noise and thus enhancing user comfort in headphones. While feedback control offers an effective way to implement ANC, it is restricted by uncertainty of the controlled system that arises, e.g., from differing wearing situations. Widely used unstructured models which capture these variations tend to overestimate the uncertainty and thus restrict ANC performance. As a remedy, this work explores uncertainty models that provide a more accurate fit to the observed variations in order to improve ANC performance for over-ear and in-ear headphones. We describe the controller optimization based on these models and implement an ANC prototype to compare the performances associated with conventional and proposed modeling approaches. Extensive measurements with human wearers confirm the robustness and indicate a performance improvement over conventional methods. The results allow to safely increase the active attenuation of ANC headphones by several decibels.
We present the results of an in-situ ideation workshop for designing data visualizations on smart wristbands that can show data around the entire wrist of a wearer. Wristbands pose interesting challenges because the visibility of different areas of the band depends on the wearer's arm posture. We focused on four usage scenarios that lead to different postures: office work, leisurely walks, cycling, and driving. As the technology for smart wristbands is not yet commercially available, we conducted a paper-based ideation exercise that showed how spatial layout and visualization design on smart wristbands may need to vary depending on the types of data items of interest and arm postures. Participants expressed a strong preference for responsive visualization designs that could adapt to the movement of wearers' arms. Supplemental material from the study is available here: https://osf.io/4hrca/.
Wearable collaborative robots stand to assist human wearers who need fall prevention assistance or wear exoskeletons. Such a robot needs to be able to constantly adapt to the surrounding scene based on egocentric vision, and predict the ego motion of the wearer. In this work, we leveraged body-mounted cameras and sensors to anticipate the trajectory of human wearers through complex surroundings. To facilitate research in ego-motion prediction, we have collected a comprehensive walking scene navigation dataset centered on the user's perspective. We then present a method to predict human motion conditioning on the surrounding static scene. Our method leverages a diffusion model to produce a distribution of potential future trajectories, taking into account the user's observation of the environment. To that end, we introduce a compact representation to encode the user's visual memory of the surroundings, as well as an efficient sample-generating technique to speed up real-time inference of a diffusion model. We ablate our model and compare it to baselines, and results show that our model outperforms existing methods on key metrics of collision avoidance and trajectory mode coverage.
While the rapid proliferation of wearable cameras has raised significant concerns about egocentric video privacy, prior work has largely overlooked the unique privacy threats posed to the camera wearer. This work investigates the core question: How much privacy information about the camera wearer can be inferred from their first-person view videos? We introduce EgoPrivacy, the first large-scale benchmark for the comprehensive evaluation of privacy risks in egocentric vision. EgoPrivacy covers three types of privacy (demographic, individual, and situational), defining seven tasks that aim to recover private information ranging from fine-grained (e.g., wearer's identity) to coarse-grained (e.g., age group). To further emphasize the privacy threats inherent to egocentric vision, we propose Retrieval-Augmented Attack, a novel attack strategy that leverages ego-to-exo retrieval from an external pool of exocentric videos to boost the effectiveness of demographic privacy attacks. An extensive comparison of the different attacks possible under all threat models is presented, showing that private information of the wearer is highly susceptible to leakage. For instance, our findings indicate
The Aria Gen 2 Pilot Dataset (A2PD) is an egocentric multimodal open dataset captured using the state-of-the-art Aria Gen 2 glasses. To facilitate timely access, A2PD is released incrementally with ongoing dataset enhancements. The initial release features Dia'ane, our primary subject, who records her daily activities alongside friends, each equipped with Aria Gen 2 glasses. It encompasses five primary scenarios: cleaning, cooking, eating, playing, and outdoor walking. In each of the scenarios, we provide comprehensive raw sensor data and output data from various machine perception algorithms. These data illustrate the device's ability to perceive the wearer, the surrounding environment, and interactions between the wearer and the environment, while maintaining robust performance across diverse users and conditions. The A2PD is publicly available at projectaria.com, with open-source tools and usage examples provided in Project Aria Tools.
With the growing adoption of wearable devices such as smart glasses for AI assistants, wearer speech recognition (WSR) is becoming increasingly critical to next-generation human-computer interfaces. However, in real environments, interference from side-talk speech remains a significant challenge to WSR and may cause accumulated errors for downstream tasks such as natural language processing. In this work, we introduce a novel multi-channel differential automatic speech recognition (ASR) method for robust WSR on smart glasses. The proposed system takes differential inputs from different frontends that complement each other to improve the robustness of WSR, including a beamformer, microphone selection, and a lightweight side-talk detection model. Evaluations on both simulated and real datasets demonstrate that the proposed system outperforms the traditional approach, achieving up to an 18.0% relative reduction in word error rate.
We present results of a replication study on smartwatch visualizations with adults aged 65 and older. The older adult population is rising globally, coinciding with their increasing interest in using small wearable devices, such as smartwatches, to track and view data. Smartwatches, however, pose challenges to this population: fonts and visualizations are often small and meant to be seen at a glance. How concise design on smartwatches interacts with aging-related changes in perception and cognition, however, is not well understood. We replicate a study that investigated how visualization type and number of data points affect glanceable perception. We observe strong evidence of differences for participants aged 75 and older, sparking interesting questions regarding the study of visualization and older adults. We discuss first steps toward better understanding and supporting an older population of smartwatch wearers and reflect on our experiences working with this population. Supplementary materials are available at https://osf.io/7x4hq/.
We present a systematic review and design space for visualizations on smartwatches and the context in which these visualizations are displayed--smartwatch faces. A smartwatch face is the main smartwatch screen that wearers see when checking the time. Smartwatch faces are small data dashboards that present a variety of data to wearers in a compact form. Yet, the usage context and form factor of smartwatch faces pose unique design challenges for visualization. In this paper, we present an in-depth review and analysis of visualization designs for popular premium smartwatch faces based on their design styles, amount and types of data, as well as visualization styles and encodings they included. From our analysis we derive a design space to provide an overview of the important considerations for new data displays for smartwatch faces and other small displays. Our design space can also serve as inspiration for design choices and grounding of empirical work on smartwatch visualization design. We end with a research agenda that points to open opportunities in this nascent research direction. Supplementary material is available at: https://osf.io/nwy2r/.
Accurately estimating the 3D pose of the camera wearer in egocentric video sequences is crucial to modeling human behavior in virtual and augmented reality applications. The task presents unique challenges due to the limited visibility of the user's body caused by the front-facing camera mounted on their head. Recent research has explored the utilization of the scene and ego-motion, but it has overlooked humans' interactive nature. We propose a novel framework for Social Egocentric Estimation of body MEshes (SEE-ME). Our approach is the first to estimate the wearer's mesh using only a latent probabilistic diffusion model, which we condition on the scene and, for the first time, on the social wearer-interactee interactions. Our in-depth study sheds light on when social interaction matters most for ego-mesh estimation; it quantifies the impact of interpersonal distance and gaze direction. Overall, SEE-ME surpasses the current best technique, reducing the pose estimation error (MPJPE) by 53%. The code is available at https://github.com/L-Scofano/SEEME.
We present Aria Everyday Activities (AEA) Dataset, an egocentric multimodal open dataset recorded using Project Aria glasses. AEA contains 143 daily activity sequences recorded by multiple wearers in five geographically diverse indoor locations. Each of the recording contains multimodal sensor data recorded through the Project Aria glasses. In addition, AEA provides machine perception data including high frequency globally aligned 3D trajectories, scene point cloud, per-frame 3D eye gaze vector and time aligned speech transcription. In this paper, we demonstrate a few exemplar research applications enabled by this dataset, including neural scene reconstruction and prompted segmentation. AEA is an open source dataset that can be downloaded from https://www.projectaria.com/datasets/aea/. We are also providing open-source implementations and examples of how to use the dataset in Project Aria Tools https://github.com/facebookresearch/projectaria_tools.
In this study, we discuss the impacts of assortative mixing by mask-wearing on the effectiveness of mask use in suppressing the propagation of epidemics. We employ the mask model, which is an epidemic model involving mask wearers and non-mask wearers. We derive the occurrence probability and mean size of large outbreaks, epidemic threshold, and average epidemic size for the mask model in an assortatively mixed random network that follows an arbitrary degree distribution. Applying our analysis to the Poisson random networks, we find that the assortative (disassortative) mixing by mask-wearing decreases (increases) the epidemic threshold. Assortative mixing, the tendency for (non-)mask wearers to prefer to connect with (non-)mask wearers, is not effective in containing epidemics in that the transmissibility required for large outbreaks to occur is small. On the other hand, in high-transmissibility cases, mask use is most effective in decreasing the occurrence probability and mean size of large outbreaks, as well as the average epidemic size, when the mixing pattern is strongly assortative. Strongly assortative mixing, resulting in the separation of mask wearers and non-mask wearers,
Since wearable linkage mechanisms could control the moment transmission from actuator(s) to wearers, they can help ensure that even low-cost wearable systems provide advanced functionality tailored to users' needs. For example, if a hip mechanism transforms an input torque into a spatially-varying moment, a wearer can get effective assistance both in the sagittal and frontal planes during walking, even with an affordable single-actuator system. However, due to the combinatorial nature of the linkage mechanism design space, the topologies of such nonlinear-moment-generating mechanisms are challenging to determine, even with significant computational resources and numerical data. Furthermore, on-premise production development and interactive design are nearly impossible in conventional synthesis approaches. Here, we propose an innovative autonomous computational approach for synthesizing such wearable robot mechanisms, eliminating the need for exhaustive searches or numerous data sets. Our method transforms the synthesis problem into a gradient-based optimization problem with sophisticated objective and constraint functions while ensuring the desired degree of freedom, range of motio
The next generation of etextiles foresees an era of smart wearable garments where embedded seamless intelligence provides the ability to sense, process and perform. Core to this vision is embedded textile functionality enabling dynamic configuration. In this paper we detail a methodology, design and implementation of a dynamic field programmable logic-driven fabric soft exosuit. Dynamic field programmability allows the soft exosuit to alter its functionality and adapt to specific exercise programs depending on the wearers need. The dynamic field programmability is enabled through motion based control arm movements of the soft exosuit triggering momentary sensors embedded in the fabric exosuit at specific joint placement points (right arm: wrist, elbow).The embedded circuitry in the fabric exosuit is implemented using a layered and interchangeable approach. This includes logic gate patches (AND,OR,NOT) and a layered textile interconnection panel. This modular and interchangeable design enhances the soft exosuits flexibility and adaptability. A truth table aligning to a rehabilitation healthcare use case was utilised. Tests were completed validating the field programmability of the s
Humans naturally perceive surrounding scenes by unifying sound and sight in a first-person view. Likewise, machines are advanced to approach human intelligence by learning with multisensory inputs from an egocentric perspective. In this paper, we explore the challenging egocentric audio-visual object localization task and observe that 1) egomotion commonly exists in first-person recordings, even within a short duration; 2) The out-of-view sound components can be created while wearers shift their attention. To address the first problem, we propose a geometry-aware temporal aggregation module to handle the egomotion explicitly. The effect of egomotion is mitigated by estimating the temporal geometry transformation and exploiting it to update visual representations. Moreover, we propose a cascaded feature enhancement module to tackle the second issue. It improves cross-modal localization robustness by disentangling visually-indicated audio representation. During training, we take advantage of the naturally available audio-visual temporal synchronization as the ``free'' self-supervision to avoid costly labeling. We also annotate and create the Epic Sounding Object dataset for evaluatio