Neurosurgery increasingly uses Mixed Reality (MR) technologies for intraoperative assistance. The greatest challenge in this area is mentally reconstructing complex 3D anatomical structures from 2D slices with millimetric precision, which is required in procedures like External Ventricular Drain (EVD) placement. MR technologies have shown great potential in improving surgical performance, however, their limited availability in clinical settings underscores the need for training systems that foster skill retention in unaided conditions. In this paper, we introduce NeuroMix, an MR-based simulator for EVD placement. We conduct a study with 48 participants to assess the impact of 2D and 3D visual aids on usability, cognitive load, technology acceptance, and procedure precision and execution time. Three training modalities are compared: one without visual aids, one with 2D aids only, and one combining both 2D and 3D aids. The training phase takes place entirely on digital objects, followed by a freehand EVD placement testing phase performed with a physical catherer and a physical phantom without MR aids. We then compare the participants performance with that of a control group that does
This work investigates the integration of generative visual aids in human-robot task communication. We developed GenComUI, a system powered by large language models that dynamically generates contextual visual aids (such as map annotations, path indicators, and animations) to support verbal task communication and facilitate the generation of customized task programs for the robot. This system was informed by a formative study that examined how humans use external visual tools to assist verbal communication in spatial tasks. To evaluate its effectiveness, we conducted a user experiment (n = 20) comparing GenComUI with a voice-only baseline. The results demonstrate that generative visual aids, through both qualitative and quantitative analysis, enhance verbal task communication by providing continuous visual feedback, thus promoting natural and effective human-robot communication. Additionally, the study offers a set of design implications, emphasizing how dynamically generated visual aids can serve as an effective communication medium in human-robot interaction. These findings underscore the potential of generative visual aids to inform the design of more intuitive and effective hum
Head-worn augmented reality (AR) is a hotly pursued and increasingly feasible contender paradigm for replacing or complementing smartphones and watches for continual information consumption. Here, we compare three different AR navigation aids (on-screen compass, on-screen radar and in-world vertical arrows) in a wide-area outdoor user study (n=24) where participants search for hidden virtual target items amongst physical and virtual objects. We analyzed participants' search task performance, movements, eye-gaze, survey responses and object recall. There were two key findings. First, all navigational aids enhanced search performance relative to a control condition, with some benefit and strongest user preference for in-world arrows. Second, users recalled fewer physical objects than virtual objects in the environment, suggesting reduced awareness of the physical environment. Together, these findings suggest that while navigational aids presented in AR can enhance search task performance, users may pay less attention to the physical environment, which could have undesirable side-effects.
The prevalence of hearing aids is increasing. However, optimizing the amplification processes of hearing aids remains challenging due to the complexity of integrating multiple modular components in traditional methods. To address this challenge, we present NeuroAMP, a novel deep neural network designed for end-to-end, personalized amplification in hearing aids. NeuroAMP leverages both spectral features and the listener's audiogram as inputs, and we investigate four architectures: Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Convolutional Recurrent Neural Network (CRNN), and Transformer. We also introduce Denoising NeuroAMP, an extension that integrates noise reduction along with amplification capabilities for improved performance in real-world scenarios. To enhance generalization, a comprehensive data augmentation strategy was employed during training on diverse speech (TIMIT and TMHINT) and music (Cadenza Challenge MUSIC) datasets. Evaluation using the Hearing Aid Speech Perception Index (HASPI), Hearing Aid Speech Quality Index (HASQI), and Hearing Aid Audio Quality Index (HAAQI) demonstrates that the Transformer architecture within NeuroAMP achieves the bes
Conventional hearing aids rely on fixed, frequency-dependent amplification and compression to manage reduced sensitivity, which often fails to provide sufficient listening support in complex environments, such as situations with multiple speakers (the ``cocktail party'' problem). To more comprehensively address the underlying encoding dysfunctions of hearing loss, we introduce the Differentiable Auditory Loop (DAL), a new open-source framework for personalized hearing aid design and fitting. Our first implementation of DAL incorporates CARFAC, a differentiable model of human cochlear function, which we ported to JAX, to optimize a deep neural network to match impaired auditory neural activity patterns with a normal-hearing reference. To build a hearing aid with the fine-grained spectro-temporal signal processing required, we adopt SEANet, a waveform-to-waveform fully convolutional UNet generator. We fine-tune the network by comparing the outputs of a CARFAC model fitted to normal hearing with that of a CARFAC model fitted to match each subject's individual hearing impairment. The comparison is done using loss functions derived from the respective CARFAC neural activity pattern (NAP
Augmented reality (AR) allows virtual information to be presented in the real world, providing support for numerous tasks including search and navigation. Allowing users access to multiple navigation aids may help leverage the benefits of different navigational guidance methods, but may also have negative perceptual and cognitive impacts. In this study, users performed searches for virtual gems within a large-scale augmented environment while choosing to deploy two different navigation aids either independently or simultaneously: world-locked arrows and an on-screen radar. After completing the search, participants were asked to recall objects that may or may not have been present in the scene. The use of navigation aids impacted object recall, with impaired recall of objects in the environment when an aid was switched on. The results point at possible impact factors of object awareness in mobile AR and underscore the potential for adaptable interfaces to support users navigating the physical world.
Audio-visual feature synchronization for real-time speech enhancement in hearing aids represents a progressive approach to improving speech intelligibility and user experience, particularly in strong noisy backgrounds. This approach integrates auditory signals with visual cues, utilizing the complementary description of these modalities to improve speech intelligibility. Audio-visual feature synchronization for real-time SE in hearing aids can be further optimized using an efficient feature alignment module. In this study, a lightweight cross-attentional model learns robust audio-visual representations by exploiting large-scale data and simple architecture. By incorporating the lightweight cross-attentional model in an AVSE framework, the neural system dynamically emphasizes critical features across audio and visual modalities, enabling defined synchronization and improved speech intelligibility. The proposed AVSE model not only ensures high performance in noise suppression and feature alignment but also achieves real-time processing with minimal latency (36ms) and energy consumption. Evaluations on the AVSEC3 dataset show the efficiency of the model, achieving significant gains ov
This paper presents a simulation-based approach to own voice detection (OVD) in hearing aids using a single microphone. While OVD can significantly improve user comfort and speech intelligibility, existing solutions often rely on multiple microphones or additional sensors, increasing device complexity and cost. To enable ML-based OVD without requiring costly transfer-function measurements, we propose a data augmentation strategy based on simulated acoustic transfer functions (ATFs) that expose the model to a wide range of spatial propagation conditions. A transformer-based classifier is first trained on analytically generated ATFs and then progressively fine-tuned using numerically simulated ATFs, transitioning from a rigid-sphere model to a detailed head-and-torso representation. This hierarchical adaptation enabled the model to refine its spatial understanding while maintaining generalization. Experimental results show 95.52% accuracy on simulated head-and-torso test data. Under short-duration conditions, the model maintained 90.02% accuracy with one-second utterances. On real hearing aid recordings, the model achieved 80% accuracy without fine-tuning, aided by lightweight test-t
The provision of information can improve individual judgments but also fail to make group decisions more accurate; if individuals choose to attend to the same information in the same manner, the predictive diversity that enables crowd wisdom may be lost. Decision support systems, from search engines to business intelligence platforms, present individuals with decision aids -- relevant information, interpretative frames, or heuristics -- to enhance the quality and speed of decision-making but potentially influence judgments through the selective presentation of information and interpretative frames. We describe decision-making as often containing two decisions: the choice of decision aids followed by the primary decision, and define \textit{metawisdom of the crowd} as any pattern by which individuals' choice of aids leads to higher crowd accuracy than equal assignment to the same aids, a comparison that accounts for the information content of the aids. The theoretical model accounting for aid bias and variance shows that an optimal distribution of aid usage can produce metawisdom based on the characteristics of aids within a collection. Three studies -- two estimation tasks (N=900,
This paper proposes an end-to-end system for the ICASSP 2023 Clarity Challenge. In this work, we introduce four major novelties: (1) a novel multi-stage system in both the magnitude and complex domains to better utilize phase information; (2) an asymmetric window pair to achieve higher frequency resolution with the 5ms latency constraint; (3) the integration of head rotation information and the mixture signals to achieve better enhancement; (4) a post-processing module that achieves higher hearing aid speech perception index (HASPI) scores with the hearing aid amplification stage provided by the baseline system.
Personalization of the amplification function of hearing aids has been shown to be of benefit to hearing aid users in previous studies. Several machine learning-based personalization approaches have been introduced in the literature. This paper presents a machine learning personalization approach with the advantage of being efficient in its training based on paired comparisons which makes it practical and field deployable. The training efficiency of this approach is the result of treating frequency bands independent of one another and by simultaneously carrying out Bayesian machine learning in each band across all of the frequency bands. Simulation results indicate that this approach leads to an estimated hearing preference function close to the true hearing preference function in fewer number of paired comparisons relative to the previous machine learning approaches. In addition, a clinical experiment conducted on eight subjects with hearing impairment indicate that this training efficient personalization approach provides personalized gain settings which are on average six times more preferred over the standard prescriptive gain settings.
In credence goods markets such as health care or repair services, consumers rely on experts with superior information to adequately diagnose and treat them. Experts, however, are constrained in their diagnostic abilities, which hurts market efficiency and consumer welfare. Technological breakthroughs that substitute or complement expert judgments have the potential to alleviate consumer mistreatment. This article studies how competitive experts adopt novel diagnostic technologies when skills are heterogeneously distributed and obfuscated to consumers. We differentiate between novel technologies that increase expert abilities, and algorithmic decision aids that complement expert judgments, but do not affect an expert's personal diagnostic precision. When consumers build up beliefs about an expert's type through repeated interactions, we show that high-ability experts may strategically forego the decision aid in order to escape a pooling equilibrium by differentiating themselves from low-ability experts. Without future visits, signaling concerns cause all experts to randomize their investment choice, leading to under-utilization from low-ability experts and over-utilization from high
Robotic mobility aids for blind and low-vision (BLV) individuals rely heavily on deep learning-based vision models specialized for various navigational tasks. However, the performance of these models is often constrained by the availability and diversity of real-world datasets, which are challenging to collect in sufficient quantities for different tasks. In this study, we investigate the effectiveness of synthetic data, generated using Unreal Engine 4, for training robust vision models for this safety-critical application. Our findings demonstrate that synthetic data can enhance model performance across multiple tasks, showcasing both its potential and its limitations when compared to real-world data. We offer valuable insights into optimizing synthetic data generation for developing robotic mobility aids. Additionally, we publicly release our generated synthetic dataset to support ongoing research in assistive technologies for BLV individuals, available at https://hchlhwang.github.io/SToP.
The DeepFilterNet (DFN) architecture was recently proposed as a deep learning model suited for hearing aid devices. Despite its competitive performance on numerous benchmarks, it still follows a `one-size-fits-all' approach, which aims to train a single, monolithic architecture that generalises across different noises and environments. However, its limited size and computation budget can hamper its generalisability. Recent work has shown that in-context adaptation can improve performance by conditioning the denoising process on additional information extracted from background recordings to mitigate this. These recordings can be offloaded outside the hearing aid, thus improving performance while adding minimal computational overhead. We introduce these principles to the DFN model, thus proposing the DFingerNet (DFiN) model, which shows superior performance on various benchmarks inspired by the DNS Challenge.
HIV/AIDS spread depends upon complex patterns of interaction among various sub-sets emerging at population level. This added complexity makes it difficult to study and model AIDS and its dynamics. AIDS is therefore a natural candidate to be modeled using agent-based modeling, a paradigm well-known for modeling Complex Adaptive Systems (CAS). While agent-based models are also well-known to effectively model CAS, often times models can tend to be ambiguous and the use of purely text-based specifications (such as ODD) can make models difficult to be replicated. Previous work has shown how formal specification may be used in conjunction with agent-based modeling to develop models of various CAS. However, to the best of our knowledge, no such model has been developed in conjunction with AIDS. In this paper, we present a Formal Agent-Based Simulation modeling framework (FABS-AIDS) for an AIDS-based CAS. FABS-AIDS employs the use of a formal specification model in conjunction with an agent-based model to reduce ambiguity as well as improve clarity in the model definition. The proposed model demonstrates the effectiveness of using formal specification in conjunction with agent-based simula
Human decision-making is plagued by many systematic errors. Many of these errors can be avoided by providing decision aids that guide decision-makers to attend to the important information and integrate it according to a rational decision strategy. Designing such decision aids used to be a tedious manual process. Advances in cognitive science might make it possible to automate this process in the future. We recently introduced machine learning methods for discovering optimal strategies for human decision-making automatically and an automatic method for explaining those strategies to people. Decision aids constructed by this method were able to improve human decision-making. However, following the descriptions generated by this method is very tedious. We hypothesized that this problem can be overcome by conveying the automatically discovered decision strategy as a series of natural language instructions for how to reach a decision. Experiment 1 showed that people do indeed understand such procedural instructions more easily than the decision aids generated by our previous method. Encouraged by this finding, we developed an algorithm for translating the output of our previous method
We establish a stochastic HIV/AIDS model for the individuals with protection awareness and reveal how the protection awareness plays its important role in the control of AIDS. We firstly show that there exists a global positive solution for the stochastic model. By constructing Lyapunov functions, the ergodic stationary distribution when $R_{0}^{s}>1$ and the extinction when $R_{0}^{e}<1$ for the stochastic model are obtained. A number of numerical simulations by using positive preserving truncated Euler-Maruyama method (PPTEM) are performed to illustrate the theoretical results. Our new results show that the detailed publicity has great impact on the control of AIDS compared with the extensive publicity, while the continuous antiretroviral therapy (ART) is helpful in the control of HIV/AIDS.
This paper reports on the design and results of the 2024 ICASSP SP Cadenza Challenge: Music Demixing/Remixing for Hearing Aids. The Cadenza project is working to enhance the audio quality of music for those with a hearing loss. The scenario for the challenge was listening to stereo reproduction over loudspeakers via hearing aids. The task was to: decompose pop/rock music into vocal, drums, bass and other (VDBO); rebalance the different tracks with specified gains and then remixing back to stereo. End-to-end approaches were also accepted. 17 systems were submitted by 11 teams. Causal systems performed poorer than non-causal approaches. 9 systems beat the baseline. A common approach was to fine-tuning pretrained demixing models. The best approach used an ensemble of models.
Novices need to overcome initial barriers while programming cyber-physical systems behavior, like coding quadcopter missions, and should thus be supported by an adequately designed programming environment. Using multiple representations by including graphical previews is a common approach to ease coding and program understanding. However, novices struggle to map information of the code and graphical previews. Previous studies imply that mapping aids in a live programming environment might support novices while programming and foster a deeper understanding of the content. To implement these mapping aids in a domain independent way Source Location Tracking based on run-time information can be used. In our study, we tested N=82 participants while interacting and learning in an online programming environment. Using our 2x2 between-subject design study, we investigated the effects of two mapping aids: highlighting and dynamic linking on coding correctness including typical errors, and learning outcomes. Based on process data, successful strategies were analyzed. Combining both mapping aids compared to one aid resulted in higher performance. While highlights were more helpful for impleme
This paper introduces our system submission for the Cadenza ICASSP 2024 Grand Challenge, which presents the problem of remixing and enhancing music for hearing aid users. Our system placed first in the challenge, achieving the best average Hearing-Aid Audio Quality Index (HAAQI) score on the evaluation data set. We describe the system, which uses an ensemble of deep learning music source separators that are fine tuned on the challenge data. We demonstrate the effectiveness of our system through the challenge results and analyze the importance of different system aspects through ablation studies.