Steering models (such as the generalized two-point model) predict human steering behavior well when the human is in direct control of a vehicle. In vehicles under autonomous control, human control inputs are not used; rather, an autonomous controller applies steering and acceleration commands to the vehicle. For example, human steering input may be used for state estimation rather than direct control. We show that human steering behavior changes when the human no longer directly controls the vehicle and the two are instead working in a shared autonomy paradigm. Thus, when a vehicle is not under direct human control, steering models like the generalized two-point model do not predict human steering behavior. We also show that the error between predicted human steering behavior and actual human steering behavior reflects a fundamental difference when the human directly controls the vehicle compared to when the vehicle is autonomously controlled. Moreover, we show that a single distribution describes the error between predicted human steering behavior and actual human steering behavior when the human's steering inputs are used for state estimation and the vehicle is autonomously contr
In this short paper we address issues related to building multimodal AI systems for human performance support in manufacturing domains. We make two contributions: we first identify challenges of participatory design and training of such systems, and secondly, to address such challenges, we propose the ACE paradigm: "Action and Control via Explanations". Specifically, we suggest that LLMs can be used to produce explanations in the form of human interpretable "semantic frames", which in turn enable end users to provide data the AI system needs to align its multimodal models and representations, including computer vision, automatic speech recognition, and document inputs. ACE, by using LLMs to "explain" using semantic frames, will help the human and the AI system to collaborate, together building a more accurate model of humans activities and behaviors, and ultimately more accurate predictive outputs for better task support, and better outcomes for human users performing manual tasks.
The study of human-robot interaction is fundamental to the design and use of robotics in real-world applications. Robots will need to predict and adapt to the actions of human collaborators in order to achieve good performance and improve safety and end-user adoption. This paper evaluates a human-robot collaboration scheme that combines the task allocation and motion levels of reasoning: the robotic agent uses Bayesian inference to predict the next goal of its human partner from his or her ongoing motion, and re-plans its own actions in real time. This anticipative adaptation is desirable in many practical scenarios, where humans are unable or unwilling to take on the cognitive overhead required to explicitly communicate their intent to the robot. A behavioral experiment indicates that the combination of goal inference and dynamic task planning significantly improves both objective and perceived performance of the human-robot team. Participants were highly sensitive to the differences between robot behaviors, preferring to work with a robot that adapted to their actions over one that did not.
Current discourse on Artificial Intelligence (AI) ethics, dominated by "trustworthy" and "responsible" AI, overlooks a more fundamental human-computer interaction (HCI) crisis: the erosion of human agency. This paper argues that the primary challenge of high-stakes AI systems is not trust, but the preservation of human causal control. We posit that "bad AI" will function as "bad UI," a metaphor for catastrophic interface failures that misrepresent system state and lead to human error. Applying Marshall McLuhan's media theory, AI can be framed as a technology of "augmentation" that simultaneously "amputates" the user's direct perception of causality. This places the interface as the critical locus where a "double uncertainty"--that of the human user and that of the probabilistic model--must be mediated. We critique current Explainable AI (XAI) for its correlational focus and failure to represent uncertainty. We conclude by proposing a rigorous, nested Causal-Agency Framework (CAF) that integrates causal models, uncertainty quantification, and human-centered evaluation to restore agency at the interface.
As AI technology continues to advance, the importance of human-AI collaboration becomes increasingly evident, with numerous studies exploring its potential in various fields. One vital field is data science, including feature engineering (FE), where both human ingenuity and AI capabilities play pivotal roles. Despite the existence of AI-generated recommendations for FE, there remains a limited understanding of how to effectively integrate and utilize humans' and AI's knowledge. To address this gap, we design a readily-usable prototype, human\&AI-assisted FE in Jupyter notebooks. It harnesses the strengths of humans and AI to provide feature suggestions to users, seamlessly integrating these recommendations into practical workflows. Using the prototype as a research probe, we conducted an exploratory study to gain valuable insights into data science practitioners' perceptions, usage patterns, and their potential needs when presented with feature suggestions from both humans and AI. Through qualitative analysis, we discovered that the Creator of the feature (i.e., AI or human) significantly influences users' feature selection, and the semantic clarity of the suggested feature gre
There are many unknowns regarding the characteristics and dynamics of human-AI teams, including a lack of understanding of how certain human-human teaming concepts may or may not apply to human-AI teams and how this composition affects team performance. This paper outlines an experimental research study that investigates essential aspects of human-AI teaming such as team performance, team situation awareness, and perceived team cognition in various mixed composition teams (human-only, human-human-AI, human-AI-AI, and AI-only) through a simulated emergency response management scenario. Results indicate dichotomous outcomes regarding perceived team cognition and performance metrics, as perceived team cognition was not predictive of performance. Performance metrics like team situational awareness and team score showed that teams composed of all human participants performed at a lower level than mixed human-AI teams, with the AI-only teams attaining the highest performance. Perceived team cognition was highest in human-only teams, with mixed composition teams reporting perceived team cognition 58% below the all-human teams. These results inform future mixed teams of the potential perfo
With the increasing interest in deploying Artificial Intelligence in medicine, we previously introduced HAIM (Holistic AI in Medicine), a framework that fuses multimodal data to solve downstream clinical tasks. However, HAIM uses data in a task-agnostic manner and lacks explainability. To address these limitations, we introduce xHAIM (Explainable HAIM), a novel framework leveraging Generative AI to enhance both prediction and explainability through four structured steps: (1) automatically identifying task-relevant patient data across modalities, (2) generating comprehensive patient summaries, (3) using these summaries for improved predictive modeling, and (4) providing clinical explanations by linking predictions to patient-specific medical knowledge. Evaluated on the HAIM-MIMIC-MM dataset, xHAIM improves average AUC from 79.9% to 90.3% across chest pathology and operative tasks. Importantly, xHAIM transforms AI from a black-box predictor into an explainable decision support system, enabling clinicians to interactively trace predictions back to relevant patient data, bridging AI advancements with clinical utility.
Our ability to build autonomous agents that leverage Generative AI continues to increase by the day. As builders and users of such agents it is unclear what parameters we need to align on before the agents start performing tasks on our behalf. To discover these parameters, we ran a qualitative empirical research study about designing agents that can negotiate during a fictional yet relatable task of selling a camera online. We found that for an agent to perform the task successfully, humans/users and agents need to align over 6 dimensions: 1) Knowledge Schema Alignment 2) Autonomy and Agency Alignment 3) Operational Alignment and Training 4) Reputational Heuristics Alignment 5) Ethics Alignment and 6) Human Engagement Alignment. These empirical findings expand previous work related to process and specification alignment and the need for values and safety in Human-AI interactions. Subsequently we discuss three design directions for designers who are imagining a world filled with Human-Agent collaborations.
Model Medicine is the science of understanding, diagnosing, treating, and preventing disorders in AI models, grounded in the principle that AI models -- like biological organisms -- have internal structures, dynamic processes, heritable traits, observable symptoms, classifiable conditions, and treatable states. This paper introduces Model Medicine as a research program, bridging the gap between current AI interpretability research (anatomical observation) and the systematic clinical practice that complex AI systems increasingly require. We present five contributions: (1) a discipline taxonomy organizing 15 subdisciplines across four divisions -- Basic Model Sciences, Clinical Model Sciences, Model Public Health, and Model Architectural Medicine; (2) the Four Shell Model (v3.3), a behavioral genetics framework empirically grounded in 720 agents and 24,923 decisions from the Agora-12 program, explaining how model behavior emerges from Core--Shell interaction; (3) Neural MRI (Model Resonance Imaging), a working open-source diagnostic tool mapping five medical neuroimaging modalities to AI interpretability techniques, validated through four clinical cases demonstrating imaging, compari
This chapter focuses on the evolution of Human-Centered Design (HCD) in aerospace systems over the last forty years. Human Factors and Ergonomics first shifted from the study of physical and medical issues to cognitive issues circa the 1980s. The advent of computers brought with it the development of human-computer interaction (HCI), which then expanded into the field of digital interaction design and User Experience (UX). We ended up with the concept of interactive cockpits, not because pilots interacted with mechanical things, but because they interacted using pointing devices on computer displays. Since the early 2000s, complexity and organizational issues gained prominence to the point that complex systems design and management found itself center stage, with the spotlight on the role of the human element and organizational setups. Today, Human Systems Integration (HSI) is no longer only a single-agent problem, but a multi-agent research field. Systems are systems of systems, considered as representations of people and machines. They are made of statically and dynamically articulated structures and functions. When they are at work, they are living organisms that generate emergi
Human-robot collaboration in surgery represents a significant area of research, driven by the increasing capability of autonomous robotic systems to assist surgeons in complex procedures. This systematic review examines the advancements and persistent challenges in the development of autonomous surgical robotic assistants (ASARs), focusing specifically on scenarios where robots provide meaningful and active support to human surgeons. Adhering to the PRISMA guidelines, a comprehensive literature search was conducted across the IEEE Xplore, Scopus, and Web of Science databases, resulting in the selection of 32 studies for detailed analysis. Two primary collaborative setups were identified: teleoperation-based assistance and direct hands-on interaction. The findings reveal a growing research emphasis on ASARs, with predominant applications currently in endoscope guidance, alongside emerging progress in autonomous tool manipulation. Several key challenges hinder wider adoption, including the alignment of robotic actions with human surgeon preferences, the necessity for procedural awareness within autonomous systems, the establishment of seamless human-robot information exchange, and th
Precision Medicine (PM) transforms the traditional "one-drug-fits-all" paradigm by customising treatments based on individual characteristics, and is an emerging topic for HCI research on digital health. A key element of PM, the Polygenic Risk Score (PRS), uses genetic data to predict an individual's disease risk. Despite its potential, PRS faces barriers to adoption, such as data inclusivity, psychological impact, and public trust. We conducted a mixed-methods study to explore how people perceive PRS, formed of surveys (n=254) and interviews (n=11) with UK-based participants. The interviews were supplemented by interactive storyboards with the ContraVision technique to provoke deeper reflection and discussion. We identified ten key barriers and five themes to PRS adoption and proposed design implications for a responsible PRS framework. To address the complexities of PRS and enhance broader PM practices, we introduce the term Human-Precision Medicine Interaction (HPMI), which integrates, adapts, and extends HCI approaches to better meet these challenges.
Human-robot collaboration has significant potential in recycling due to the wide variation in the composition of recyclable products. Six participants performed a recyclable item sorting task collaborating with a robot arm equipped with a vision system. The effect of three different levels of human involvement or assistance to the robot (Level 1- occlusion removal; Level 2- optimal spacing; Level 3- optimal grip) on performance metrics such as robot accuracy, task time and subjective fluency were assessed. Results showed that human involvement had a remarkable impact on the robot's accuracy, which increased with human involvement level. Mean accuracy values were 33.3% for Level 1, 69% for Level 2 and 100% for Level 3. The results imply that for sorting processes involving diverse materials that vary in size, shape, and composition, human assistance could improve the robot's accuracy to a significant extent while also being cost-effective.
The Human Cell Atlas (HCA) will be made up of comprehensive reference maps of all human cells - the fundamental units of life - as a basis for understanding fundamental human biological processes and diagnosing, monitoring, and treating disease. It will help scientists understand how genetic variants impact disease risk, define drug toxicities, discover better therapies, and advance regenerative medicine. A resource of such ambition and scale should be built in stages, increasing in size, breadth, and resolution as technologies develop and understanding deepens. We will therefore pursue Phase 1 as a suite of flagship projects in key tissues, systems, and organs. We will bring together experts in biology, medicine, genomics, technology development and computation (including data analysis, software engineering, and visualization). We will also need standardized experimental and computational methods that will allow us to compare diverse cell and tissue types - and samples across human communities - in consistent ways, ensuring that the resulting resource is truly global. This document, the first version of the HCA White Paper, was written by experts in the field with feedback and sug
Saliency maps can explain how deep neural networks classify images. But are they actually useful for humans? The present systematic review of 68 user studies found that while saliency maps can enhance human performance, null effects or even costs are quite common. To investigate what modulates these effects, the empirical outcomes were organised along several factors related to the human tasks, AI performance, XAI methods, images to be classified, human participants and comparison conditions. In image-focused tasks, benefits were less common than in AI-focused tasks, but the effects depended on the specific cognitive requirements. Moreover, benefits were usually restricted to incorrect AI predictions in AI-focused tasks but to correct ones in image-focused tasks. XAI-related factors had surprisingly little impact. The evidence was limited for image- and human-related factors and the effects were highly dependent on the comparison conditions. These findings may support the design of future user studies.
The interaction and collaboration between humans and multiple robots represent a novel field of research known as human multi-robot systems. Adequately designed systems within this field allow teams composed of both humans and robots to work together effectively on tasks such as monitoring, exploration, and search and rescue operations. This paper presents a deep reinforcement learning-based affective workload allocation controller specifically for multi-human multi-robot teams. The proposed controller can dynamically reallocate workloads based on the performance of the operators during collaborative missions with multi-robot systems. The operators' performances are evaluated through the scores of a self-reported questionnaire (i.e., subjective measurement) and the results of a deep learning-based cognitive workload prediction algorithm that uses physiological and behavioral data (i.e., objective measurement). To evaluate the effectiveness of the proposed controller, we use a multi-human multi-robot CCTV monitoring task as an example and carry out comprehensive real-world experiments with 32 human subjects for both quantitative measurement and qualitative analysis. Our results demo
With the rapid growth in virtual reality technologies, object interaction is becoming increasingly more immersive, elucidating human perception and leading to promising directions towards evaluating human performance under different settings. This spike in technological growth exponentially increased the need for a human performance metric in 3D space. Fitts' law is perhaps the most widely used human prediction model in HCI history attempting to capture human movement in lower dimensions. Despite the collective effort towards deriving an advanced extension of a 3D human performance model based on Fitts' law, a standardized metric is still missing. Moreover, most of the extensions to date assume or limit their findings to certain settings, effectively disregarding important variables that are fundamental to 3D object interaction. In this review, we investigate and analyze the most prominent extensions of Fitts' law and compare their characteristics pinpointing to potentially important aspects for deriving a higher-dimensional performance model. Lastly, we mention the complexities, frontiers as well as potential challenges that may lay ahead.
Heuristic evaluation is a widely used method in Human-Computer Interaction (HCI) to inspect interfaces and identify issues based on heuristics. Recently, Large Language Models (LLMs), such as GPT-4o, have been applied in HCI to assist in persona creation, the ideation process, and the analysis of semi-structured interviews. However, considering the need to understand heuristics and the high degree of abstraction required to evaluate them, LLMs may have difficulty conducting heuristic evaluation. However, prior research has not investigated GPT-4o's performance in heuristic evaluation compared to HCI experts in web-based systems. In this context, this study aims to compare the results of a heuristic evaluation performed by GPT-4o and human experts. To this end, we selected a set of screenshots from a web system and asked GPT-4o to perform a heuristic evaluation based on Nielsen's Heuristics from a literature-grounded prompt. Our results indicate that only 21.2% of the issues identified by human experts were also identified by GPT-4o, despite it found 27 new issues. We also found that GPT-4o performed better for heuristics related to aesthetic and minimalist design and match between
While generative artificial intelligence (Gen AI) increasingly transforms academic environments, a critical gap exists in understanding and mitigating human biases in AI interactions, such as anchoring and confirmation bias. This position paper advocates for metacognitive AI literacy interventions to help university students critically engage with AI and address biases across the Human-AI interaction workflows. The paper presents the importance of considering (1) metacognitive support with deliberate friction focusing on human bias; (2) bi-directional Human-AI interaction intervention addressing both input formulation and output interpretation; and (3) adaptive scaffolding that responds to diverse user engagement patterns. These frameworks are illustrated through ongoing work on "DeBiasMe," AIED (AI in Education) interventions designed to enhance awareness of cognitive biases while empowering user agency in AI interactions. The paper invites multiple stakeholders to engage in discussions on design and evaluation methods for scaffolding mechanisms, bias visualization, and analysis frameworks. This position contributes to the emerging field of AI-augmented learning by emphasizing the
The Oxford English Dictionary defines precision medicine as "medical care designed to optimize efficiency or therapeutic benefit for particular groups of patients, especially by using genetic or molecular profiling." It is not an entirely new idea: physicians from ancient times have recognized that medical treatment needs to consider individual variations in patient characteristics. However, the modern precision medicine movement has been enabled by a confluence of events: scientific advances in fields such as genetics and pharmacology, technological advances in mobile devices and wearable sensors, and methodological advances in computing and data sciences. This chapter is about bandit algorithms: an area of data science of special relevance to precision medicine. With their roots in the seminal work of Bellman, Robbins, Lai and others, bandit algorithms have come to occupy a central place in modern data science ( Lattimore and Szepesvari, 2020). Bandit algorithms can be used in any situation where treatment decisions need to be made to optimize some health outcome. Since precision medicine focuses on the use of patient characteristics to guide treatment, contextual bandit algorith