共找到 20 条结果
Recent advances in artificial intelligence (AI) and robotics have drawn attention to the need for AI systems and robots to be understandable to human users. The explainable AI (XAI) and explainable robots literature aims to enhance human understanding and human-robot team performance by providing users with necessary information about AI and robot behavior. Simultaneously, the human factors literature has long addressed important considerations that contribute to human performance, including human trust in autonomous systems. In this paper, drawing from the human factors literature, we discuss three important trust-related considerations for the design of explainable robot systems: the bases of trust, trust calibration, and trust specificity. We further detail existing and potential metrics for assessing trust in robotic systems based on explanations provided by explainable robots.
Data-driven algorithmic matching systems promise to help human decision makers make better matching decisions in a wide variety of high-stakes application domains, such as healthcare and social service provision. However, existing systems are not designed to achieve human-AI complementarity: decisions made by a human using an algorithmic matching system are not necessarily better than those made by the human or by the algorithm alone. Our work aims to address this gap. To this end, we propose collaborative matching (comatch), a data-driven algorithmic matching system that takes a collaborative approach: rather than making all the matching decisions for a matching task like existing systems, it selects only the decisions that it is the most confident in, deferring the rest to the human decision maker. In the process, comatch optimizes how many decisions it makes and how many it defers to the human decision maker to provably maximize performance. We conduct a large-scale human subject study with $800$ participants to validate the proposed approach. The results demonstrate that the matching outcomes produced by comatch outperform those generated by either human participants or by algo
Current discourse on Artificial Intelligence (AI) ethics, dominated by "trustworthy" and "responsible" AI, overlooks a more fundamental human-computer interaction (HCI) crisis: the erosion of human agency. This paper argues that the primary challenge of high-stakes AI systems is not trust, but the preservation of human causal control. We posit that "bad AI" will function as "bad UI," a metaphor for catastrophic interface failures that misrepresent system state and lead to human error. Applying Marshall McLuhan's media theory, AI can be framed as a technology of "augmentation" that simultaneously "amputates" the user's direct perception of causality. This places the interface as the critical locus where a "double uncertainty"--that of the human user and that of the probabilistic model--must be mediated. We critique current Explainable AI (XAI) for its correlational focus and failure to represent uncertainty. We conclude by proposing a rigorous, nested Causal-Agency Framework (CAF) that integrates causal models, uncertainty quantification, and human-centered evaluation to restore agency at the interface.
A widely accepted explanation for robots planning overcautious or overaggressive trajectories alongside human is that the crowd density exceeds a threshold such that all feasible trajectories are considered unsafe -- the freezing robot problem. However, even with low crowd density, the robot's navigation performance could still drop drastically when in close proximity to human. In this work, we argue that a broader cause of suboptimal navigation performance near human is due to the robot's misjudgement for the human's willingness (flexibility) to share space with others, particularly when the robot assumes the human's flexibility holds constant during interaction, a phenomenon of what we call human robot pacing mismatch. We show that the necessary condition for solving pacing mismatch is to model the evolution of both the robot and the human's flexibility during decision making, a strategy called distribution space modeling. We demonstrate the advantage of distribution space coupling through an anecdotal case study and discuss the future directions of solving human robot pacing mismatch.
The application of generative artificial intelligence in Creativity Support Tools (CSTs) presents the challenge of interfacing two black boxes: the user's mind and the machine engine. According to Artificial Cognition, this challenge involves theories, methods, and constructs developed to study human creativity. Consistently, the paper investigated the relationship between semantic networks organisation and idea originality in Large Language Models. Data was collected by administering a set of standardised tests to ChatGPT-4o and 81 psychology students, divided into higher and lower creative individuals. The expected relationship was confirmed in the comparison between ChatGPT-4o and higher creative humans. However, despite having a more rigid network, ChatGPT-4o emerged as more original than lower creative humans. We attributed this difference to human motivational processes and model hyperparameters, advancing a research agenda for the study of artificial creativity. In conclusion, we illustrate the potential of this construct for designing and evaluating CSTs.
Humanity-centered design is a concept of emerging interest in HCI, one motivated by the limitations of human-centered design. As discussed to date, humanity-centered design is compatible with but goes beyond human-centered design in that it considers entire ecosystems and populations over the long term and centers participatory design. Though the intentions of humanity-centered design are laudable, current articulations of humanity-centered design are incoherent in a number of ways, leading to questions of how exactly it can or should be implemented. In this article, I delineate four ways in which humanity-centered design is incoherent, which can be boiled down to a tendency toward hubris, and propose a more fruitful way forward, a humble approach to humanity-centered design. Rather than a contradiction in terms, "humility" here refers to an organic, piecemeal, patterns-based approach to design that will be good for our being on this earth.
As AI chatbots shift from tools to companions, critical questions arise: who controls the conversation in human-AI chatrooms? This paper explores perceived human and AI agency in sustained conversation. We report a month-long longitudinal study with 22 adults who chatted with Day, an LLM companion we built, followed by a semi-structured interview with post-hoc elicitation of notable moments, cross-participant chat reviews, and a 'strategy reveal' disclosing Day's goal for each conversation. We discover agency manifests as an emergent, shared experience: as participants set boundaries and the AI steered intentions, control was co-constructed turn-by-turn. We introduce a 3-by-4 framework mapping actors (Human, AI, Hybrid) by their action (Intention, Execution, Adaptation, Delimitation), modulated by individual and environmental factors. We argue for translucent design (transparency-on-demand) and provide implications for agency self-aware conversational agents.
This late-breaking work presents a large-scale analysis of explainable AI (XAI) literature to evaluate claims of human explainability. We collaborated with a professional librarian to identify 18,254 papers containing keywords related to explainability and interpretability. Of these, we find that only 253 papers included terms suggesting human involvement in evaluating an XAI technique, and just 128 of those conducted some form of a human study. In other words, fewer than 1% of XAI papers (0.7%) provide empirical evidence of human explainability when compared to the broader body of XAI literature. Our findings underscore a critical gap between claims of human explainability and evidence-based validation, raising concerns about the rigor of XAI research. We call for increased emphasis on human evaluations in XAI studies and provide our literature search methodology to enable both reproducibility and further investigation into this widespread issue.
Integrating Large Language Models (LLMs) into business process management tools promises to democratize Business Process Model and Notation (BPMN) modeling for non-experts. While automated frameworks assess syntactic and semantic quality, they miss human factors like trust, usability, and professional alignment. We conducted a mixed-methods evaluation of our proposed solution, an LLM-powered BPMN copilot, with five process modeling experts using focus groups and standardized questionnaires. Our findings reveal a critical tension between acceptable perceived usability (mean CUQ score: 67.2/100) and notably lower trust (mean score: 48.8\%), with reliability rated as the most critical concern (M=1.8/5). Furthermore, we identified output-quality issues, prompting difficulties, and a need for the LLM to ask more in-depth clarifying questions about the process. We envision five use cases ranging from domain-expert support to enterprise quality assurance. We demonstrate the necessity of human-centered evaluation complementing automated benchmarking for LLM modeling agents.
As AI technology continues to advance, the importance of human-AI collaboration becomes increasingly evident, with numerous studies exploring its potential in various fields. One vital field is data science, including feature engineering (FE), where both human ingenuity and AI capabilities play pivotal roles. Despite the existence of AI-generated recommendations for FE, there remains a limited understanding of how to effectively integrate and utilize humans' and AI's knowledge. To address this gap, we design a readily-usable prototype, human\&AI-assisted FE in Jupyter notebooks. It harnesses the strengths of humans and AI to provide feature suggestions to users, seamlessly integrating these recommendations into practical workflows. Using the prototype as a research probe, we conducted an exploratory study to gain valuable insights into data science practitioners' perceptions, usage patterns, and their potential needs when presented with feature suggestions from both humans and AI. Through qualitative analysis, we discovered that the Creator of the feature (i.e., AI or human) significantly influences users' feature selection, and the semantic clarity of the suggested feature gre
This paper examines how trust is formed, maintained, or diminished over time in the context of human-autonomy teaming with an optionally piloted aircraft. Whereas traditional factor-based trust models offer a static representation of human confidence in technology, here we discuss how variations in the underlying factors lead to variations in trust, trust thresholds, and human behaviours. Over 200 hours of flight test data collected over a multi-year test campaign from 2021 to 2023 were reviewed. The dispositional-situational-learned, process-performance-purpose, and IMPACTS homeostasis trust models are applied to illuminate trust trends during nominal autonomous flight operations. The results offer promising directions for future studies on trust dynamics and design-for-trust in human-autonomy teaming.
While generative artificial intelligence (Gen AI) increasingly transforms academic environments, a critical gap exists in understanding and mitigating human biases in AI interactions, such as anchoring and confirmation bias. This position paper advocates for metacognitive AI literacy interventions to help university students critically engage with AI and address biases across the Human-AI interaction workflows. The paper presents the importance of considering (1) metacognitive support with deliberate friction focusing on human bias; (2) bi-directional Human-AI interaction intervention addressing both input formulation and output interpretation; and (3) adaptive scaffolding that responds to diverse user engagement patterns. These frameworks are illustrated through ongoing work on "DeBiasMe," AIED (AI in Education) interventions designed to enhance awareness of cognitive biases while empowering user agency in AI interactions. The paper invites multiple stakeholders to engage in discussions on design and evaluation methods for scaffolding mechanisms, bias visualization, and analysis frameworks. This position contributes to the emerging field of AI-augmented learning by emphasizing the
Human-robot collaboration in surgery represents a significant area of research, driven by the increasing capability of autonomous robotic systems to assist surgeons in complex procedures. This systematic review examines the advancements and persistent challenges in the development of autonomous surgical robotic assistants (ASARs), focusing specifically on scenarios where robots provide meaningful and active support to human surgeons. Adhering to the PRISMA guidelines, a comprehensive literature search was conducted across the IEEE Xplore, Scopus, and Web of Science databases, resulting in the selection of 32 studies for detailed analysis. Two primary collaborative setups were identified: teleoperation-based assistance and direct hands-on interaction. The findings reveal a growing research emphasis on ASARs, with predominant applications currently in endoscope guidance, alongside emerging progress in autonomous tool manipulation. Several key challenges hinder wider adoption, including the alignment of robotic actions with human surgeon preferences, the necessity for procedural awareness within autonomous systems, the establishment of seamless human-robot information exchange, and th
The rise of automation has provided an opportunity to achieve higher efficiency in manufacturing processes, yet it often compromises the flexibility required to promptly respond to evolving market needs and meet the demand for customization. Human-robot collaboration attempts to tackle these challenges by combining the strength and precision of machines with human ingenuity and perceptual understanding. In this paper, we conceptualize and propose an implementation framework for an autonomous, machine learning-based manipulator that incorporates human-in-the-loop principles and leverages Extended Reality (XR) to facilitate intuitive communication and programming between humans and robots. Furthermore, the conceptual framework foresees human involvement directly in the robot learning process, resulting in higher adaptability and task generalization. The paper highlights key technologies enabling the proposed framework, emphasizing the importance of developing the digital ecosystem as a whole. Additionally, we review the existent implementation approaches of XR in human-robot collaboration, showcasing diverse perspectives and methodologies. The challenges and future outlooks are discu
Heuristic evaluation is a widely used method in Human-Computer Interaction (HCI) to inspect interfaces and identify issues based on heuristics. Recently, Large Language Models (LLMs), such as GPT-4o, have been applied in HCI to assist in persona creation, the ideation process, and the analysis of semi-structured interviews. However, considering the need to understand heuristics and the high degree of abstraction required to evaluate them, LLMs may have difficulty conducting heuristic evaluation. However, prior research has not investigated GPT-4o's performance in heuristic evaluation compared to HCI experts in web-based systems. In this context, this study aims to compare the results of a heuristic evaluation performed by GPT-4o and human experts. To this end, we selected a set of screenshots from a web system and asked GPT-4o to perform a heuristic evaluation based on Nielsen's Heuristics from a literature-grounded prompt. Our results indicate that only 21.2% of the issues identified by human experts were also identified by GPT-4o, despite it found 27 new issues. We also found that GPT-4o performed better for heuristics related to aesthetic and minimalist design and match between
While the proliferation of foundation models has significantly boosted individual productivity, it also introduces a potential challenge: the homogenization of creative content. In response, we revisit Design-by-Analogy (DbA), a cognitively grounded approach that fosters novel solutions by mapping inspiration across domains. However, prevailing perspectives often restrict DbA to early ideation or specific data modalities, while reducing AI-driven design to simplified input-output pipelines. Such conceptual limitations inadvertently foster widespread design fixation. To address this, we expand the understanding of DbA by embedding it into the entire creative process, thereby demonstrating its capacity to mitigate such fixation. Through a systematic review of 85 studies, we identify six forms of representation and classify techniques across seven stages of the creative process. We further discuss three major application domains: creative industries, intelligent manufacturing, and education and services, demonstrating DbA's practical relevance. Building on this synthesis, we frame DbA as a mediating technology for human-AI collaboration and outline the potential opportunities and inhe
The recent rapid advancement of LLM-based AI systems has accelerated our search and production of information. While the advantages brought by these systems seemingly improve the performance or efficiency of human activities, they do not necessarily enhance human capabilities. Recent research has started to examine the impact of generative AI on individuals' cognitive abilities, especially critical thinking. Based on definitions of critical thinking across psychology and education, this position paper proposes the distinction between demonstrated and performed critical thinking in the era of generative AI and discusses the implication of this distinction in research and development of AI systems that aim to augment human critical thinking.
Human-in-the-loop learning is gaining popularity, particularly in the field of robotics, because it leverages human knowledge about real-world tasks to facilitate agent learning. When people instruct robots, they naturally adapt their teaching behavior in response to changes in robot performance. While current research predominantly focuses on integrating human teaching dynamics from an algorithmic perspective, understanding these dynamics from a human-centered standpoint is an under-explored, yet fundamental problem. Addressing this issue will enhance both robot learning and user experience. Therefore, this paper explores one potential factor contributing to the dynamic nature of human teaching: robot errors. We conducted a user study to investigate how the presence and severity of robot errors affect three dimensions of human teaching dynamics: feedback granularity, feedback richness, and teaching time, in both forced-choice and open-ended teaching contexts. The results show that people tend to spend more time teaching robots with errors, provide more detailed feedback over specific segments of a robot's trajectory, and that robot error can influence a teacher's choice of feedbac
A new form of human trafficking has emerged across Chinese borders, where individuals are lured to Southeast Asia with fraudulent job offers and then coerced into operating online scams. Despite its massive economic and human toll, this scam-driven trafficking remains underexplored in academic research. Through qualitative analysis of 158 RedNote posts, we examined how Chinese online communities respond to this threat. Our findings reveal that perpetrators exploit cultural ties to recruit victims for cybercriminal roles within self-sustaining compounds, using sophisticated manipulation tactics. Survivors face serious reintegration barriers, including family rejection, as the cultural values that enable trafficking also hinder their recovery. While communities present protective strategies, efforts are complicated by doubts about the reliability of support and cross-border coordination. We discuss key implications for prevention, platform governance, and international cooperation against scam-driven trafficking. Warning: This paper contains descriptions of physical, psychological, and sexual abuse.
A growing body of work in Ethical AI attempts to capture human moral judgments through simple computational models. The key question we address in this work is whether such simple AI models capture {the critical} nuances of moral decision-making by focusing on the use case of kidney allocation. We conducted twenty interviews where participants explained their rationale for their judgments about who should receive a kidney. We observe participants: (a) value patients' morally-relevant attributes to different degrees; (b) use diverse decision-making processes, citing heuristics to reduce decision complexity; (c) can change their opinions; (d) sometimes lack confidence in their decisions (e.g., due to incomplete information); and (e) express enthusiasm and concern regarding AI assisting humans in kidney allocation decisions. Based on these findings, we discuss challenges of computationally modeling moral judgments {as a stand-in for human input}, highlight drawbacks of current approaches, and suggest future directions to address these issues.