Partner selection is crucial for cooperation and hinges on communication. As artificial agents, especially those powered by large language models (LLMs), become more autonomous, intelligent, and persuasive, they compete with humans for partnerships. Yet little is known about how humans select between human and AI partners and adapt under AI-induced competition pressure. We constructed a communication-based partner selection game and examined the dynamics in hybrid mini-societies of humans and bots powered by a state-of-the-art LLM. Through three experiments (N = 975), we found that bots, though more prosocial than humans and linguistically distinguishable, were not selected preferentially when their identity was hidden. Instead, humans misattributed bots' behaviour to humans and vice versa. Disclosing bots' identity induced a dual effect: it reduced bots' initial chances of being selected but allowed them to gradually outcompete humans by facilitating human learning about the behaviour of each partner type. These findings show how AI can reshape social interaction in mixed societies and inform the design of more effective and cooperative hybrid systems.
Large Language Models (LLMs) have shown remarkable promise in communicating with humans. Their potential use as artificial partners with humans in sociological experiments involving conversation is an exciting prospect. But how viable is it? Here, we rigorously test the limits of agents that debate using LLMs in a preregistered study that runs multiple debate-based opinion consensus games. Each game starts with six humans, six agents, or three humans and three agents. We found that agents can blend in and concentrate on a debate's topic better than humans, improving the productivity of all players. Yet, humans perceive agents as less convincing and confident than other humans, and several behavioral metrics of humans and agents we collected deviate measurably from each other. We observed that agents are already decent debaters, but their behavior generates a pattern distinctly different from the human-generated data.
'Digital humans' are digital reproductions of humans powered by artificial intelligence (AI) and capable of communicating and forming emotional bonds. The value creation potential of digital humans is overlooked due to the limitations of digital human technologies. This article explores the value creation potential and the value realisation limitations of digital humans. The analysis is based on a review of 62 articles retrieved from the Web of Science database. The analysis suggests that digital humans have the potential to alleviate labour and skill shortages, reduce the natural human element in high-risk tasks, avoid design errors, improve the ergonomics of products and workplaces, and provide guidance and emotional support, all of which will benefit natural humans in the workplace. However, technical limits, evolving understanding of digital humans, the social significance and acceptance of digital humans, ethical considerations, and the adjustment of legal tradition limit the value realisation. This review suggests that digital humans' perceived usefulness and ease of development determine organisations' willingness to utilise this technology. Overcoming the limitations, which
This paper compares agency in humans with potential agency in AI programs. Human agency takes many years to develop, as the frontal lobe is activated. Early attempts to endow LLMs agency have met serious obstacles. Progress requires a new architecture where actions and plans are formulated jointly with the human actors in each real world setting.
Voice-based AI development faces unique challenges in processing both linguistic and paralinguistic information. This study compares how large audio-language models (LALMs) and humans integrate speaker characteristics during speech comprehension, asking whether LALMs process speaker-contextualized language in ways that parallel human cognitive mechanisms. We compared two LALMs' (Qwen2-Audio and Ultravox 0.5) processing patterns with human EEG responses. Using surprisal and entropy metrics from the models, we analyzed their sensitivity to speaker-content incongruency across social stereotype violations (e.g., a man claiming to regularly get manicures) and biological knowledge violations (e.g., a man claiming to be pregnant). Results revealed that Qwen2-Audio exhibited increased surprisal for speaker-incongruent content and its surprisal values significantly predicted human N400 responses, while Ultravox 0.5 showed limited sensitivity to speaker characteristics. Importantly, neither model replicated the human-like processing distinction between social violations (eliciting N400 effects) and biological violations (eliciting P600 effects). These findings reveal both the potential and l
From fake social media accounts and generative artificial intelligence chatbots to trading algorithms and self-driving vehicles, robots, bots and algorithms are proliferating and permeating our communication channels, social interactions, economic transactions and transportation arteries. Networks of multiple interdependent and interacting humans and intelligent machines constitute complex social systems for which the collective outcomes cannot be deduced from either human or machine behaviour alone. Under this paradigm, we review recent research and identify general dynamics and patterns in situations of competition, coordination, cooperation, contagion and collective decision-making, with context-rich examples from high-frequency trading markets, a social media platform, an open collaboration community and a discussion forum. To ensure more robust and resilient human-machine communities, we require a new sociology of humans and machines. Researchers should study these communities using complex system methods; engineers should explicitly design artificial intelligence for human-machine and machine-machine interactions; and regulators should govern the ecological diversity and soci
Current approaches in human-aware or social robot navigation address the humans that are visible to the robot. However, it is also important to address the possible emergences of humans to avoid shocks or surprises to humans and erratic behavior of the robot planner. In this paper, we propose a novel approach to detect and address these human emergences called `invisible humans'. We determine the places from which a human, currently not visible to the robot, can appear suddenly and then adapt the path and speed of the robot with the anticipation of potential collisions. This is done while still considering and adapting humans present in the robot's field of view. We also show how this detection can be exploited to identify and address the doorways or narrow passages. Finally, the effectiveness of the proposed methodology is shown through several simulated and real-world experiments.
Social interactions promote well-being, yet barriers like geographic distance, time limitations, and mental health conditions can limit face-to-face interactions. Emotionally responsive AI systems, such as chatbots, offer new opportunities for social and emotional support, but raise critical questions about how empathy is perceived and experienced in human-AI interactions. This study examines how empathy is evaluated in AI-generated versus human responses. Using personal narratives, we explored how persona attributes (e.g., gender, empathic traits, shared experiences) and story qualities affect empathy ratings. We compared responses from standard and fine-tuned AI models with human judgments. Results show that while humans are highly sensitive to emotional vividness and shared experience, AI-responses are less influenced by these cues, often lack nuance in empathic expression. These findings highlight challenges in designing emotionally intelligent systems that respond meaningfully across diverse users and contexts, and informs the design of ethically aware tools to support social connection and well-being.
We investigate the potential of large language models (LLMs) to disentangle text variables--to remove the textual traces of an undesired forbidden variable in a task sometimes known as text distillation and closely related to the fairness in AI and causal inference literature. We employ a range of various LLM approaches in an attempt to disentangle text by identifying and removing information about a target variable while preserving other relevant signals. We show that in the strong test of removing sentiment, the statistical association between the processed text and sentiment is still detectable to machine learning classifiers post-LLM-disentanglement. Furthermore, we find that human annotators also struggle to disentangle sentiment while preserving other semantic content. This suggests there may be limited separability between concept variables in some text contexts, highlighting limitations of methods relying on text-level transformations and also raising questions about the robustness of disentanglement methods that achieve statistical independence in representation space.
This Chapter examines the dynamics of conflict and collaboration in human-machine systems, with a particular focus on large-scale, internet-based collaborative platforms. While these platforms represent successful examples of collective knowledge production, they are also sites of significant conflict, as diverse participants with differing intentions and perspectives interact. The analysis identifies recurring patterns of interaction, including serial attacks, reciprocal revenge, and third-party interventions. These microstructures reveal the role of experience, cultural differences, and topic sensitivity in shaping human-human, human-machine, and machine-machine interactions. The chapter further investigates the role of algorithmic agents and bots, highlighting their dual nature: they enhance collaboration by automating tasks but can also contribute to persistent conflicts with both humans and other machines. We conclude with policy recommendations that emphasize transparency, balance, cultural sensitivity, and governance to maximize the benefits of human-machine synergy while minimizing potential detriments.
Building machines that can reason about physical events and their causal relationships is crucial for flexible interaction with the physical world. However, most existing physical and causal reasoning benchmarks are exclusively based on synthetically generated events and synthetic natural language descriptions of causal relationships. This design brings up two issues. First, there is a lack of diversity in both event types and natural language descriptions; second, causal relationships based on manually-defined heuristics are different from human judgments. To address both shortcomings, we present the CLEVRER-Humans benchmark, a video reasoning dataset for causal judgment of physical events with human labels. We employ two techniques to improve data collection efficiency: first, a novel iterative event cloze task to elicit a new representation of events in videos, which we term Causal Event Graphs (CEGs); second, a data augmentation technique based on neural language generative models. We convert the collected CEGs into questions and answers to be consistent with prior work. Finally, we study a collection of baseline approaches for CLEVRER-Humans question-answering, highlighting th
Real-time crowd-powered systems, such as Chorus/Evorus, VizWiz, and Apparition, have shown how incorporating humans into automated systems could supplement where the automatic solutions fall short. However, one unspoken bottleneck of applying such architectures to more scenarios is the longer latency of including humans in the loop of automated systems. For the applications that have hard constraints in turnaround times, human-operated components' longer latency and large speed variation seem to be apparent deal breakers. This paper explicates and quantifies these limitations by using a human-powered text-based backend to hold conversations with users through a voice-only smart speaker. Smart speakers must respond to users' requests within seconds, so the workers behind the scenes only have a few seconds to compose answers. We measured the end-to-end system latency and the conversation quality with eight pairs of participants, showing the challenges and superiority of such systems.
There is a recent trend of applying multi-agent reinforcement learning (MARL) to train an agent that can cooperate with humans in a zero-shot fashion without using any human data. The typical workflow is to first repeatedly run self-play (SP) to build a policy pool and then train the final adaptive policy against this pool. A crucial limitation of this framework is that every policy in the pool is optimized w.r.t. the environment reward function, which implicitly assumes that the testing partners of the adaptive policy will be precisely optimizing the same reward function as well. However, human objectives are often substantially biased according to their own preferences, which can differ greatly from the environment reward. We propose a more general framework, Hidden-Utility Self-Play (HSP), which explicitly models human biases as hidden reward functions in the self-play objective. By approximating the reward space as linear functions, HSP adopts an effective technique to generate an augmented policy pool with biased policies. We evaluate HSP on the Overcooked benchmark. Empirical results show that our HSP method produces higher rewards than baselines when cooperating with learned
Human-AI collaboration faces growing challenges as AI systems increasingly outperform humans on complex tasks, while humans remain responsible for orchestration, validation, and decision oversight. To address this imbalance, we introduce Human Tool, an MCP-style interface abstraction, building on recent Model Context Protocol designs, that exposes humans as callable tools within AI-led, proactive workflows. Here, "tool" denotes a coordination abstraction, not a reduction of human authority or responsibility. Building on LLM-based agent architectures, we operationalize Human Tool by modeling human contributions through structured tool schemas of capabilities, information, and authority. These schemas enable agents to dynamically invoke human input based on relative strengths and reintegrate it through efficient, natural interaction protocols. We validate the framework through controlled studies in both decision-making and creative tasks, demonstrating improved task performance, reduced human workload, and more balanced collaboration dynamics compared to baseline systems. Finally, we discuss implications for human-centered AI design, highlighting how MCP-style human tools enable stro
Existing game AI research mainly focuses on enhancing agents' abilities to win games, but this does not inherently make humans have a better experience when collaborating with these agents. For example, agents may dominate the collaboration and exhibit unintended or detrimental behaviors, leading to poor experiences for their human partners. In other words, most game AI agents are modeled in a "self-centered" manner. In this paper, we propose a "human-centered" modeling scheme for collaborative agents that aims to enhance the experience of humans. Specifically, we model the experience of humans as the goals they expect to achieve during the task. We expect that agents should learn to enhance the extent to which humans achieve these goals while maintaining agents' original abilities (e.g., winning games). To achieve this, we propose the Reinforcement Learning from Human Gain (RLHG) approach. The RLHG approach introduces a "baseline", which corresponds to the extent to which humans primitively achieve their goals, and encourages agents to learn behaviors that can effectively enhance humans in achieving their goals better. We evaluate the RLHG agent in the popular Multi-player Online
This position paper looks briefly at the way we attempt to program robotic AI systems. Many AI systems are based on the idea of trying to improve the performance of one individual system to beyond so-called human baselines. However, these systems often look at one shot and one-way decisions, whereas the real world is more continuous and interactive. Humans, however, are often able to recover from and learn from errors - enabling a much higher rate of success. We look at the challenges of building a system that can detect/recover from its own errors, using the example of robotic nuclear gloveboxes as a use case to help illustrate examples. We then go on to talk about simple starting designs.
In the intelligent era, the interaction between humans and intelligent systems fundamentally involves collaboration with autonomous intelligent agents. Human-AI Collaboration (HAC) represents a novel type of human-machine relationship facilitated by autonomous intelligent machines equipped with AI technologies. In this paradigm, AI agents serve not only as auxiliary tools but also as active teammates, partnering with humans to accomplish tasks collaboratively. Human-centered AI (HCAI) emphasizes that humans play critical leadership roles in the collaboration. This human-led collaboration imparts new dimensions to the human-machine relationship, necessitating innovative research perspectives, paradigms, and agenda to address the unique challenges posed by HAC. This chapter delves into the essence of HAC from the human-centered perspective, outlining its core concepts and distinguishing features. It reviews the current research methodologies and research agenda within the HAC field from the HCAI perspective, highlighting advancements and ongoing studies. Furthermore, a framework for human-centered HAC (HCHAC) is proposed by integrating these reviews and analyses. A case study of HAC
Real-time collaboration with humans poses challenges due to the different behavior patterns of humans resulting from diverse physical constraints. Existing works typically focus on learning safety constraints for collaboration, or how to divide and distribute the subtasks between the participating agents to carry out the main task. In contrast, we propose to learn a human constraints model that, in addition, considers the diverse behaviors of different human operators. We consider a type of collaboration in a shared-autonomy fashion, where both a human operator and an assistive robot act simultaneously in the same task space that affects each other's actions. The task of the assistive agent is to augment the skill of humans to perform a shared task by supporting humans as much as possible, both in terms of reducing the workload and minimizing the discomfort for the human operator. Therefore, we propose an augmentative assistant agent capable of learning and adapting to human physical constraints, aligning its actions with the ergonomic preferences and limitations of the human operator.
Humans are constantly exposed to sequences of events in the environment. Those sequences frequently evince statistical regularities, such as the probabilities with which one event transitions to another. Collectively, inter-event transition probabilities can be modeled as a graph or network. Many real-world networks are organized hierarchically and understanding how humans learn these networks is an ongoing aim of current investigations. While much is known about how humans learn basic transition graph topology, whether and to what degree humans can learn hierarchical structures in such graphs remains unknown. We investigate how humans learn hierarchical graphs of the Sierpiński family using computer simulations and behavioral laboratory experiments. We probe the mental estimates of transition probabilities via the surprisal effect: a phenomenon in which humans react more slowly to less expected transitions, such as those between communities or modules in the network. Using mean-field predictions and numerical simulations, we show that surprisal effects are stronger for finer-level than coarser-level hierarchical transitions. Surprisal effects at coarser levels of the hierarchy are
The Augmented Human vision broadly seeks to improve or expand baseline human functioning through the restoration or extension of physical, intellectual, and social capabilities. However, given the rapid pace of technology development, we ask: what exactly does Augmented Human research involve, what are its core themes, and how has the Augmented Human(s) conference series evolved over time? To answer this, we conducted a scientometric analysis on the past 15 years of the Augmented Human(s) conference (N=735 paper), focusing on: geographical aspects, submissions and citation timelines, author frequency and popularity, and topic modeling. We find that: (a) Number of papers in the conference exhibit a bimodal distribution, peaking in 2015 and 2025, but showing periods of stagnant growth; (b) key topics over time include Haptics, Wearable Sensing, Vision & Eye Tracking, Embodied Interaction, and Sports / Motion; (c) some seminal papers on AH are not published in AH(s), but rather at related venues (e.g., CHI); (d) the conference has an active Japanese HCI community despite its historical Eurocentric location dominance. We contribute a closer look at the trajectory of the AH(s) field