共找到 20 条结果
The increasing popularity of egocentric cameras has generated growing interest in studying multi-camera interactions in shared environments. Although large-scale datasets such as Ego4D and Ego-Exo4D have propelled egocentric vision research, interactions between multiple camera wearers remain underexplored-a key gap for applications like immersive learning and collaborative robotics. To bridge this, we present TF2025, an expanded dataset with synchronized first- and third-person views. In addition, we introduce a sequence-based method to identify first-person wearers in third-person footage, combining motion cues and person re-identification.
One of the typical purposes of using lower-limb exoskeleton robots is to provide assistance to the wearer by supporting their weight and augmenting their physical capabilities according to a given task and human motion intentions. The generalizability of robots across different wearers in multiple tasks is important to ensure that the robot can provide correct and effective assistance in actual implementation. However, most lower-limb exoskeleton robots exhibit only limited generalizability. Therefore, this paper proposes a human-in-the-loop learning and adaptation framework for exoskeleton robots to improve their performance in various tasks and for different wearers. To suit different wearers, an individualized walking trajectory is generated online using dynamic movement primitives and Bayes optimization. To accommodate various tasks, a task translator is constructed using a neural network to generalize a trajectory to more complex scenarios. These generalization techniques are integrated into a unified variable impedance model, which regulates the exoskeleton to provide assistance while ensuring safety. In addition, an anomaly detection network is developed to quantitatively ev
Explosive Ordnance Disposal (EOD) suits are widely used to protect human operators to execute emergency tasks such as bomb disposal and neutralization. Current suit designs still need to be improved in terms of wearer comfort, which can be assessed based on the interaction forces at the human-suit contact regions. This paper introduces a simulation-based modeling framework that computes the interaction loads at the human-suit interface based on a wearer's kinematic movement data. The proposed modeling framework consists of three primary components: a) inertial and geometric modeling of the EOD suit, b) state estimation of the wearer's in-suit movement, and c) inverse dynamics analysis to calculate the human-suit interface forces based on the simulated human-suit model and the estimated human movement data. This simulation-based modeling method could be used to complement experimental testing for improving the time and cost efficiency of EOD suit evaluation. The accuracy of the simulated interface load was experimentally benchmarked during three different human tasks (each with three trials), by comparing the predicted interface forces with that measured by commercial pressure senso
We consider scenarios in which we wish to perform joint scene understanding, object tracking, activity recognition, and other tasks in environments in which multiple people are wearing body-worn cameras while a third-person static camera also captures the scene. To do this, we need to establish person-level correspondences across first- and third-person videos, which is challenging because the camera wearer is not visible from his/her own egocentric video, preventing the use of direct feature matching. In this paper, we propose a new semi-Siamese Convolutional Neural Network architecture to address this novel challenge. We formulate the problem as learning a joint embedding space for first- and third-person videos that considers both spatial- and motion-domain cues. A new triplet loss function is designed to minimize the distance between correct first- and third-person matches while maximizing the distance between incorrect ones. This end-to-end approach performs significantly better than several baselines, in part by learning the first- and third-person features optimized for matching jointly with the distance measure itself.
Camera glasses create fundamental privacy tensions between wearers seeking recording functionality and bystanders concerned about unauthorized surveillance. We present a systematic multi-stakeholder evaluation of privacy mechanisms through surveys (N=525) and paired interviews (N=20) in China. Study 1 quantifies expectation-willingness gaps: bystanders consistently demand stronger information transparency and protective measures than wearers will provide, with disparities intensifying in sensitive contexts where 65-90% of bystanders would take defensive action. Study 2 evaluates twelve privacy-enhancing technologies, revealing four fundamental trade-offs that undermine current approaches: visibility versus disruption, empowerment versus burden, protection versus agency, and accountability versus exposure. These gaps reflect structural incompatibilities rather than inadequate goodwill, with context emerging as the primary determinant of privacy acceptability. We propose context-adaptive pathways that dynamically adjust protection strategies: minimal-friction visibility in public spaces, structured negotiation in semi-public environments, and automatic protection in sensitive context
Accurately forecasting human trajectories from an egocentric perspective plays a central role in applications such as humanoid robotics, wearable sensing systems, and assistive navigation. However, progress in this direction remains limited due to the scarcity of egocentric trajectory datasets collected in real-world environments. Addressing this need, we introduce EgoTraj, an egocentric multimodal open dataset recorded using Meta Quest Pro (MQPro). EgoTraj contains 75 sequences of human navigation collected from multiple MQPro wearers in real-world urban environments. Each recording provides synchronized RGB video along with ground-truth data, including continuous time-synchronized 6-degree-of-freedom head poses, per-frame 3D eye gaze vectors, scene annotations. To the best of our knowledge, EgoTraj differs from typical egocentric trajectory datasets by capturing long-horizon, self-directed navigation across diverse urban routes with broad participant diversity. To demonstrate the potential of the dataset, we benchmark several state-of-the-art methods for egocentric trajectory prediction and conduct ablation studies to analyze the contributions of gaze, scene, and motion cues. The
Advances in wearable robotics challenge the traditional definition of human motor systems, as wearable robots redefine body structure, movement capability, and perception of their own bodies. We measured gait performance and perceived body images via Selected Coefficient of Perceived Motion, SCoMo, after each training session. Based on human motor learning theory extended to wearer-robot systems, we hypothesized that learning the perceived body image when walking with a robotic leg co-evolves with the actual gait improvement and becomes more certain and more accurate to the actual motion. Our result confirmed that motor learning improved both physical and perceived gait pattern towards normal, indicating that via practice the wearers incorporated the robotic leg into their sensorimotor systems to enable wearer-robot movement coordination. However, a persistent discrepancy between perceived and actual motion remained, likely due to the absence of direct sensation and control of the prosthesis from wearers. Additionally, the perceptual overestimation at the later training sessions might limit further motor improvement. These findings suggest that enhancing the human sense of wearable
We present the results of an in-situ ideation workshop for designing data visualizations on smart wristbands that can show data around the entire wrist of a wearer. Wristbands pose interesting challenges because the visibility of different areas of the band depends on the wearer's arm posture. We focused on four usage scenarios that lead to different postures: office work, leisurely walks, cycling, and driving. As the technology for smart wristbands is not yet commercially available, we conducted a paper-based ideation exercise that showed how spatial layout and visualization design on smart wristbands may need to vary depending on the types of data items of interest and arm postures. Participants expressed a strong preference for responsive visualization designs that could adapt to the movement of wearers' arms. Supplemental material from the study is available here: https://osf.io/4hrca/.
With the growing adoption of wearable devices such as smart glasses for AI assistants, wearer speech recognition (WSR) is becoming increasingly critical to next-generation human-computer interfaces. However, in real environments, interference from side-talk speech remains a significant challenge to WSR and may cause accumulated errors for downstream tasks such as natural language processing. In this work, we introduce a novel multi-channel differential automatic speech recognition (ASR) method for robust WSR on smart glasses. The proposed system takes differential inputs from different frontends that complement each other to improve the robustness of WSR, including a beamformer, microphone selection, and a lightweight side-talk detection model. Evaluations on both simulated and real datasets demonstrate that the proposed system outperforms the traditional approach, achieving up to an 18.0% relative reduction in word error rate.
While the rapid proliferation of wearable cameras has raised significant concerns about egocentric video privacy, prior work has largely overlooked the unique privacy threats posed to the camera wearer. This work investigates the core question: How much privacy information about the camera wearer can be inferred from their first-person view videos? We introduce EgoPrivacy, the first large-scale benchmark for the comprehensive evaluation of privacy risks in egocentric vision. EgoPrivacy covers three types of privacy (demographic, individual, and situational), defining seven tasks that aim to recover private information ranging from fine-grained (e.g., wearer's identity) to coarse-grained (e.g., age group). To further emphasize the privacy threats inherent to egocentric vision, we propose Retrieval-Augmented Attack, a novel attack strategy that leverages ego-to-exo retrieval from an external pool of exocentric videos to boost the effectiveness of demographic privacy attacks. An extensive comparison of the different attacks possible under all threat models is presented, showing that private information of the wearer is highly susceptible to leakage. For instance, our findings indicate
Wearable collaborative robots stand to assist human wearers who need fall prevention assistance or wear exoskeletons. Such a robot needs to be able to constantly adapt to the surrounding scene based on egocentric vision, and predict the ego motion of the wearer. In this work, we leveraged body-mounted cameras and sensors to anticipate the trajectory of human wearers through complex surroundings. To facilitate research in ego-motion prediction, we have collected a comprehensive walking scene navigation dataset centered on the user's perspective. We then present a method to predict human motion conditioning on the surrounding static scene. Our method leverages a diffusion model to produce a distribution of potential future trajectories, taking into account the user's observation of the environment. To that end, we introduce a compact representation to encode the user's visual memory of the surroundings, as well as an efficient sample-generating technique to speed up real-time inference of a diffusion model. We ablate our model and compare it to baselines, and results show that our model outperforms existing methods on key metrics of collision avoidance and trajectory mode coverage.
We present results of a replication study on smartwatch visualizations with adults aged 65 and older. The older adult population is rising globally, coinciding with their increasing interest in using small wearable devices, such as smartwatches, to track and view data. Smartwatches, however, pose challenges to this population: fonts and visualizations are often small and meant to be seen at a glance. How concise design on smartwatches interacts with aging-related changes in perception and cognition, however, is not well understood. We replicate a study that investigated how visualization type and number of data points affect glanceable perception. We observe strong evidence of differences for participants aged 75 and older, sparking interesting questions regarding the study of visualization and older adults. We discuss first steps toward better understanding and supporting an older population of smartwatch wearers and reflect on our experiences working with this population. Supplementary materials are available at https://osf.io/7x4hq/.
Active noise control (ANC) has become popular for reducing noise and thus enhancing user comfort in headphones. While feedback control offers an effective way to implement ANC, it is restricted by uncertainty of the controlled system that arises, e.g., from differing wearing situations. Widely used unstructured models which capture these variations tend to overestimate the uncertainty and thus restrict ANC performance. As a remedy, this work explores uncertainty models that provide a more accurate fit to the observed variations in order to improve ANC performance for over-ear and in-ear headphones. We describe the controller optimization based on these models and implement an ANC prototype to compare the performances associated with conventional and proposed modeling approaches. Extensive measurements with human wearers confirm the robustness and indicate a performance improvement over conventional methods. The results allow to safely increase the active attenuation of ANC headphones by several decibels.
The Aria Gen 2 Pilot Dataset (A2PD) is an egocentric multimodal open dataset captured using the state-of-the-art Aria Gen 2 glasses. To facilitate timely access, A2PD is released incrementally with ongoing dataset enhancements. The initial release features Dia'ane, our primary subject, who records her daily activities alongside friends, each equipped with Aria Gen 2 glasses. It encompasses five primary scenarios: cleaning, cooking, eating, playing, and outdoor walking. In each of the scenarios, we provide comprehensive raw sensor data and output data from various machine perception algorithms. These data illustrate the device's ability to perceive the wearer, the surrounding environment, and interactions between the wearer and the environment, while maintaining robust performance across diverse users and conditions. The A2PD is publicly available at projectaria.com, with open-source tools and usage examples provided in Project Aria Tools.
We present Aria Everyday Activities (AEA) Dataset, an egocentric multimodal open dataset recorded using Project Aria glasses. AEA contains 143 daily activity sequences recorded by multiple wearers in five geographically diverse indoor locations. Each of the recording contains multimodal sensor data recorded through the Project Aria glasses. In addition, AEA provides machine perception data including high frequency globally aligned 3D trajectories, scene point cloud, per-frame 3D eye gaze vector and time aligned speech transcription. In this paper, we demonstrate a few exemplar research applications enabled by this dataset, including neural scene reconstruction and prompted segmentation. AEA is an open source dataset that can be downloaded from https://www.projectaria.com/datasets/aea/. We are also providing open-source implementations and examples of how to use the dataset in Project Aria Tools https://github.com/facebookresearch/projectaria_tools.
Accurately estimating the 3D pose of the camera wearer in egocentric video sequences is crucial to modeling human behavior in virtual and augmented reality applications. The task presents unique challenges due to the limited visibility of the user's body caused by the front-facing camera mounted on their head. Recent research has explored the utilization of the scene and ego-motion, but it has overlooked humans' interactive nature. We propose a novel framework for Social Egocentric Estimation of body MEshes (SEE-ME). Our approach is the first to estimate the wearer's mesh using only a latent probabilistic diffusion model, which we condition on the scene and, for the first time, on the social wearer-interactee interactions. Our in-depth study sheds light on when social interaction matters most for ego-mesh estimation; it quantifies the impact of interpersonal distance and gaze direction. Overall, SEE-ME surpasses the current best technique, reducing the pose estimation error (MPJPE) by 53%. The code is available at https://github.com/L-Scofano/SEEME.
We present a systematic review and design space for visualizations on smartwatches and the context in which these visualizations are displayed--smartwatch faces. A smartwatch face is the main smartwatch screen that wearers see when checking the time. Smartwatch faces are small data dashboards that present a variety of data to wearers in a compact form. Yet, the usage context and form factor of smartwatch faces pose unique design challenges for visualization. In this paper, we present an in-depth review and analysis of visualization designs for popular premium smartwatch faces based on their design styles, amount and types of data, as well as visualization styles and encodings they included. From our analysis we derive a design space to provide an overview of the important considerations for new data displays for smartwatch faces and other small displays. Our design space can also serve as inspiration for design choices and grounding of empirical work on smartwatch visualization design. We end with a research agenda that points to open opportunities in this nascent research direction. Supplementary material is available at: https://osf.io/nwy2r/.
In this study, we discuss the impacts of assortative mixing by mask-wearing on the effectiveness of mask use in suppressing the propagation of epidemics. We employ the mask model, which is an epidemic model involving mask wearers and non-mask wearers. We derive the occurrence probability and mean size of large outbreaks, epidemic threshold, and average epidemic size for the mask model in an assortatively mixed random network that follows an arbitrary degree distribution. Applying our analysis to the Poisson random networks, we find that the assortative (disassortative) mixing by mask-wearing decreases (increases) the epidemic threshold. Assortative mixing, the tendency for (non-)mask wearers to prefer to connect with (non-)mask wearers, is not effective in containing epidemics in that the transmissibility required for large outbreaks to occur is small. On the other hand, in high-transmissibility cases, mask use is most effective in decreasing the occurrence probability and mean size of large outbreaks, as well as the average epidemic size, when the mixing pattern is strongly assortative. Strongly assortative mixing, resulting in the separation of mask wearers and non-mask wearers,
The next generation of etextiles foresees an era of smart wearable garments where embedded seamless intelligence provides the ability to sense, process and perform. Core to this vision is embedded textile functionality enabling dynamic configuration. In this paper we detail a methodology, design and implementation of a dynamic field programmable logic-driven fabric soft exosuit. Dynamic field programmability allows the soft exosuit to alter its functionality and adapt to specific exercise programs depending on the wearers need. The dynamic field programmability is enabled through motion based control arm movements of the soft exosuit triggering momentary sensors embedded in the fabric exosuit at specific joint placement points (right arm: wrist, elbow).The embedded circuitry in the fabric exosuit is implemented using a layered and interchangeable approach. This includes logic gate patches (AND,OR,NOT) and a layered textile interconnection panel. This modular and interchangeable design enhances the soft exosuits flexibility and adaptability. A truth table aligning to a rehabilitation healthcare use case was utilised. Tests were completed validating the field programmability of the s