The integration of Human-Computer Interaction (HCI) and healthcare technologies is transforming the landscape of mental health interventions. Despite the growing adoption of mental health apps, current evaluation methods often neglect the interplay between interface design, personalization, emotional resonance, privacy, and community engagement. These gaps limit the capacity of digital tools to meet therapeutic goals while maintaining user trust and long-term engagement. This study examined the role of Xin Dao Diary, an AI-assisted platform, in enhancing emotional well-being through innovative design and user engagement strategies. With a mixed-methods approach, including walkthrough methods, diary studies, and sentiment analysis of user feedback, we explored how digital interfaces can facilitate effective mental health care. Our findings reveal that intuitive interface design and personalized AI interventions improve user satisfaction and emotional health outcomes. However, challenges remain in data privacy, algorithmic transparency, and the authenticity of emotional responses, which may undermine user trust and limit long-term engagement. The present research proposes a Holistic AI Care Design which emphasized the integration of multiple factors, including user needs, AI personalization, privacy, and community building in app design. It also incorporates usability, user engagement, and ethical considerations into the evaluation of AI-assisted mental health apps. This research underscores the importance of interdisciplinary approaches in advancing digital health solutions, offering valuable insights for developers and healthcare practitioners aiming to optimize user experience and therapeutic efficacy.
The metaverse refers to a digital environment that enables real-time user interaction through immersive technologies. Recent advancements in deep learning have particularly strengthened the capabilities of Natural Language Processing (NLP) and Large Language Models (LLMs). These developments have made human-computer interactions in the metaverse more natural and responsive, especially those involving users and non-Player Characters (NPCs). Many virtual platforms use LLMs-powered Application Programming Interfaces (APIs) to facilitate these interactions, but these often produce long, semantically irrelevant responses that weaken the user's immersive experience. This study addresses this limitation by designing NLP systems capable of generating concise, context-aware, and task-oriented outputs for AI-powered NPCs. Unlike open-domain conversational agents, dialogue systems for metaverse-based NPCs operate under strict real-time and contextual constraints. NPC interactions require concise, task-oriented, and context-aware responses, as overly long or semantically irrelevant outputs can disrupt immersion and degrade user experience. Although recent advances in LLMs have improved dialogue generation, most existing studies focus on open-ended conversations or general-purpose question answering. This study addresses this gap by systematically investigating fine-tuning and Retrieval-Augmented Generation (RAG) strategies within a metaverse-focused dialogue domain. We propose a comparative evaluation of systems developed using fine-tuning and RAG techniques between decoder-only models (GPT-2, LLaMA, Qwen) and encoder-decoder models (mBART, mT5). The models trained on the dataset were evaluated using a combination of standard evaluation metrics and semantic-based criteria. All evaluation scores were normalized using the Technique for Order Preference by Similarity to Ideal Solution (TOPSIS) method to ensure objective comparability between models. The findings indicate that RAG provides a more balanced performance, particularly when applied to encoder-decoder models such as mBART (~ 0.652) and mT5 (~ 0.555), even when trained on relatively small datasets. Additionally, this paper presents a speech-based interaction framework designed to enable personalized and real-time communication in metaverse environments. The proposed framework is structured as Speech-to-Text (STT) → LLMs → Text-to-Speech (TTS). This architecture improves interaction quality by enabling coherent and realistic speech-based communication.
Combination therapy is an essential strategy for treating complex diseases. However, unintended drug-drug interactions (DDIs) can compromise therapeutic efficacy or even cause severe adverse reactions, posing significant challenges to clinical safety and drug development. Accurate DDI prediction is therefore crucial for ensuring drug safety and guiding rational drug use. Although deep learning-based models have achieved significant progress in this field, most existing methods still face two major limitations: Some existing methods take the directionality of DDIs into account but overlook the diversity of interaction mechanisms, while others emphasize interaction diversity yet ignore directional effects. Such partial modeling fails to comprehensively capture pharmacological relationships and limits prediction accuracy. To overcome these challenges, we introduce DisenKGE-DDI, a novel framework based on a disentangled graph attention network, which enhances DDI prediction by incorporating both micro-disentanglement and macro-disentanglement mechanisms. At the micro-disentanglement level, a factor-aware relation-based message aggregation method is designed, leveraging a relation-aware guided routing strategy for selecting subsets of neighbors that are relevant to the current semantics and incorporating a dual-layer attention mechanism to learn embedding representations of different latent factors (components) of drug entities, precisely capturing intricate local semantic features. At the macro-disentanglement level, mutual information regularization is used to enforce independence among distinct semantic components, thereby ensuring that their representations remain non-interfering. This generates more adaptive drug embeddings that comprehensively capture the diverse interaction characteristics between drugs. Experimental results indicate that DisenKGE-DDI exhibits superior efficacy compared to state-of-the-art methods on public benchmark datasets, highlighting its superior performance in DDI prediction. Source code is available at https://github.com/HENU406/DisenKGE-DDI .
The identification of lncRNA-miRNA interactions (LMIs) is crucial for deciphering post-transcriptional regulatory networks and their roles in development and disease. While computational methods have been developed to predict LMIs, existing approaches are often limited by an inability to effectively integrate multimodal biological data and to handle the severe class imbalance inherent to biological networks. To overcome these limitations, we present LMI-MHGAT, a novel deep learning framework for LMI prediction based on a Multilayer Heterogeneous Graph Attention network. Our model integrates diverse data-including RNA sequences, expression profiles, and known molecular interactions-into a unified graph representation. A key innovation is the use of a graph attention mechanism that dynamically learns to weight information from different relational layers, enabling the model to learn robust embeddings for lncRNAs and miRNAs. LMI-MHGAT significantly outperforms 14 existing methods on human LMI data, demonstrating exceptional robustness under severe class imbalance (positive-to-negative ratio 1:60). The model generalizes effectively, achieving state-of-the-art performance on rat and plant datasets. Case studies confirm its ability to recover disease-associated regulatory axes and predict novel, biologically plausible interactions. LMI-MHGAT provides a more powerful and robust framework for LMI prediction by simultaneously addressing key limitations in data utilization and integration. The tool is freely accessible at https://github.com/Zhenpm/LMI-MHGAT.
Skeleton-based human action recognition has attracted increasing attention in recent years. However, most existing methods focus on single-person scenarios and struggle with complex behaviors in multi-person groups. In particular, they lack the capability to automatically identify and model core person. To address these challenges, this paper proposes a star-shaped group interaction model for skeleton-based action recognition. Firstly, the character importance scoring system analyzes both individual and group aspects: it evaluates each person's individual importance based on motion intensity and motion complexity, and assesses their significance within the group using centrality and interactivity. This process enables accurate identification of the core person in the video. Secondly, a core-star interaction graph is constructed with the core person as the center node and other individuals as peripheral nodes. The relationships among individuals are categorized into self-connections, centripetal connections, and centrifugal connections. For each type of connection, we design differentiated data augmentation strategies to fully exploit diverse action and interaction features. Finally, the structured skeleton data is fed into the star-shaped spatio-temporal graph convolutional network for efficient feature extraction and action classification. Experiments on several public benchmark datasets demonstrate that our method achieves state-of-the-art performance, achieving accuracies of 79.1%, 96.1%, and 93.1% on the NBA, Volleyball, and Volleyball-weak datasets, respectively.
Follicle-stimulating hormone (FSH) is a glycoprotein involved in oogenesis and subsequent maturation. An inadequate level of this hormone critically disrupts pre-vitellogenic oocyte progression. This disruption is a major bottleneck in the captive maturation of Asian catfish, Clarias magur. The current study investigates the potential biological molecular interaction of recombinant human follicle-stimulating hormone (r-hFSH) and FSHR of Clarias magur through comprehensive in silico analysis. Subsequent in vivo research was conducted on the progression of pre-vitellogenic oocytes in Clarias magur by r-hFSH induction and shower simulation consisting of three treatments and one control, viz. C: Control, S: Shower, H: Hormone and SH: Shower and hormone. The in silico evaluation revealed that r-hFSH had a better interaction with the C. magur FSH receptor (FSHR) than native FSHβ. These predicted affinity values confirm the trend observed in our docking scores. They provide quantitative support for the conclusion that the recombinant human ligand forms a highly stable complex with the C. magur receptor. In a subsequent 60-day in vivo experiment, weekly r-hFSH administration significantly elevated serum biomarker. Hormone stimulation, with and without shower, effectively advanced oocytes from the pre-vitellogenic stage to spawning. It showed superior fecundity, fertilization, and hatching rates. This demonstrates its potential as a reliable strategy to induce oocyte development and improve captive breeding success in C. magur. The current finding emphasizes a novel opportunity for regulating oogenesis and the advancement of pre-vitellogenic oocyte development to maturation in captive farmed C. magur, utilizing solely r-hFSH.
Deciphering how genes interact within human cells is essential for understanding their functional wiring and for developing targeted therapeutic strategies. In this study, we present a genome-scale map of genetic interactions in the human haploid cell line HAP1, based on CRISPR-based perturbation of ∼4 million gene pairs. The resulting network comprises ∼89,000 high-confidence gene-gene interactions, organizing genes into hierarchical modules corresponding to protein complexes and pathways, biological processes, and cellular compartments, mirroring principles observed in yeast and highlighting the functional architecture of a human cell. This large-scale genetic network complements the DepMap gene co-essentiality network by capturing unique functional information, uncovering roles of previously uncharacterized genes, and identifying molecular determinants of cancer-cell-line-specific genetic dependencies. This study presents a general data-driven strategy for systematically exploring the roles of genes and their functional connections in human cell lines.
Operational robots have demonstrated significant potential in complex scenarios such as live-line maintenance and medical surgery. Existing research on Mixed Reality (MR) and Digital Twin (DT) systems has primarily focused on unidirectional data visualization and passive state monitoring. Existing research on Mixed Reality (MR) and Digital Twin (DT) systems has primarily focused on unidirectional data visualization and passive state monitoring, acting as "open-loop" observation tools that fail to address low operational precision and inefficient human-robot synergy in dynamic, high-risk environments. For the first time, we integrate an MR-based closed-loop digital twin operating system for human-robot collaborative operation into the task execution of live-line operation equipment to address the above challenges. Moving beyond simple visualization, the proposed framework establishes an integrated operational paradigm that bridges the gap between immersive perception and real-time interventional control. This framework comprises three integral components: (1) the construction of a high-fidelity virtual digital twin; (2) the development of a human-computer interaction paradigm based on MR technology; and (3) the establishment of an MR-based human-machine collaborative operation mode. Building upon this framework, a system was implemented for live-line working robots. Experimental results indicate that, compared with traditional control methods, the proposed system reduces the task completion time of live-line equipment tasks by 14.3% on average, verifying the feasibility and effectiveness of the pioneering application of the closed-loop digital twin operating system in live-line operation equipment.
暂无摘要(点击查看详情)
Artificial sensing systems have broad application potential in areas such as health monitoring, human-computer interaction, and rehabilitation medicine. However, most existing systems are limited to one-way acquisition and transmission of electrical signals and lack intuitive, real-time feedback for interactive use. This unidirectional operation limits the availability of direct, human-interpretable output cues, thereby restricting their effectiveness in scenarios that require real-time guidance and dynamic interaction, such as rehabilitation training and interactive learning. Introducing a feedback mechanism can effectively overcome this limitation by providing intuitive visual output and enabling a more interactive "perception-feedback-adjustment" pathway, which may improve both the efficiency and precision of human-machine interaction. To address this challenge, we developed a novel artificial sensing system that integrates highly sensitive motion detection with real-time multicolor optical feedback. The stretchable triboelectric nanogenerator (TENG) used as a self-powered motion sensor exhibited sensitivities of 0.145 kPa-1 in the low-pressure region (<8 kPa) and 0.019 kPa-1 in the high-pressure region (8-30 kPa). The proposed artificial sensing system, integrating the TENG with a quantum dot light-emitting diode (QLED)-based synaptic device, achieved an overall motion-state recognition accuracy of 98.12%. Compared with conventional electrical feedback, optical feedback in the form of directly observable visual output provides intuitive visualization, strong resistance to electromagnetic interference, and the ability to support multichannel parallel information transmission, making it particularly suitable for delivering clear and unambiguous status indications in complex environments. The synergistic integration of TENG-based mechanical perception and QLED-based optoelectronic feedback demonstrated in this work offers a promising design paradigm for constructing simple, efficient, and intuitive artificial sensory systems.
Accurate identification of cognitive styles is important for personalized learning environment optimization and human-computer interaction system design. Traditional self-report measures suffer from subjectivity bias, so this study developed a machine learning classification model based on objective physiological data. Focusing on the distinction between verbal and representational cognitive styles, the study collected eye-movement data from 85 participants in a standardized cognitive task via eye-tracking technology. We extracted multidimensional eye-movement features and systematically evaluated the classification performance of six machine learning algorithms: decision tree (DT), K-nearest neighbor algorithm (KNN), plain Bayes (NB), support vector machine (SVM), logistic regression (LR), and integrated learning model (EL). Experimental results show that all algorithms can effectively utilize eye movement features for cognitive style classification, with SVM performing optimally, after optimizing the parameters using the grid optimization method, achieving 82.1% classification accuracy (F1 = 0.715). The method proposed in this study provides a new way for non-invasive assessment of cognitive styles, which can be applied to real-time adaptive learning systems. The research results provide important insights into the development of personalization of educational technology, adaptive design of learning interfaces, and cognitive-perceptual computing systems, and provide valuable references for the fields of educational psychology and human-computer interaction research.
Emotion representation is a critical aspect of artificial intelligence, particularly in human-computer interaction and affective computing. Emotion recognition from multi-modal data remains challenging due to the complex semantic relationships between textual, audio, and visual features. This study proposes a hybrid model combining Enhanced Graph Attention Networks and Bidirectional Long Short-Term Memory to address this challenge. First, E-GAT captures structural dependencies between emotional features by constructing a semantic graph from text embeddings. Second, Bi-LSTM models temporal dynamics of sequential data, enabling effective integration of contextual information. We evaluated the model on three benchmark datasets: SemEval-2018 (text-only), RAVDESS (audio-visual), and CMU-MOSEI (multi-modal). Experimental results show that the proposed model achieves state-of-the-art performance: 58.5% accuracy and 68.7% F1-score on SemEval-2018, outperforming baseline models. On multi-modal datasets, it achieves 78.9% accuracy (RAVDESS) and 82.3% accuracy (CMU-MOSEI), demonstrating robust cross-modal generalization. This work advances emotion recognition by providing a unified framework for both text-only and multi-modal scenarios, with applications in human-computer interaction and mental health monitoring.
Facial Emotion Recognition (FER) has emerged as an important research topic in human-computer interaction (HCI) with advances in machine learning and deep learning. FER is used across various fields, including healthcare, marketing, gaming, education, security, and real-time human-robot interaction. One real-time application of FER is a movie recommendation system that suggests movies based on users' emotions. In this paper, we have used an optimised compound-scaling neural architecture with polynomial and radial basis function (RBF) kernels for facial emotion recognition. To implement this research, we have used four diversified facial emotion datasets. All four datasets are subjected to pre-processing, feature extraction, and classification. All four datasets utilise different SVM algorithms for emotion recognition, and each achieves a different level of accuracy. Various visualization techniques, such as t-SNE and Grad-CAM, are used to analyze feature space separability and improve model interpretability. The maximum accuracy of the proposed model is 89.68%. Using our customised CNN model, we can predict seven facial emotions: happy, sad, angry, fearful, surprised, neutral, and disgusted. The proposed model can be used in various settings where real-time facial emotion recognition plays a vital role.
Formative evaluation is widely used in implementation science to anticipate barriers and facilitators prior to the deployment of health technologies, typically relying on stakeholders' reported beliefs collected before real-world exposure. This approach has proven informative for many digital health tools; however, its application to immersive and embodied technologies such as extended reality (XR) warrants closer scrutiny. Most XR interventions in health care are delivered through head-mounted displays, which depend on spatial perception and sensorimotor engagement. Several implementation-relevant properties, including comfort, perceived intrusiveness, safety, and workflow disruption, often become apparent only through direct interaction. At the same time, large segments of the health care workforce remain XR-naive, such that preuse judgments are frequently shaped by anticipation rather than experience. Drawing on concepts from implementation science, grounded cognition, and human-computer interaction, this Viewpoint examines a plausible interpretive problem in XR adoption and argues that perception-based formative evaluation, when applied through frameworks developed for screen-based technologies, may misclassify barriers and facilitators. Rather than questioning formative evaluation as a methodological approach, we identify a boundary condition for its interpretability in experience-dependent technologies and propose a pragmatic refinement: incorporating brief experiential familiarization before eliciting stakeholder perceptions to strengthen early-stage assessment and improve alignment with real-world implementation decisions.
As generative artificial intelligence (GenAI) becomes increasingly integrated into educational contexts, learners' interactions with AI systems have emerged as a crucial determinant of learning effectiveness. Yet, prior studies have primarily emphasized performance outcomes or technology acceptance, overlooking the underlying cognitive and psychological mechanisms. Grounded in Human-Computer Interaction (HCI) and metacognitive theory, this study develops and validates a comprehensive model that positions learners' perceived interactivity with AI as the antecedent, positive and negative AI metacognition as dual mediators, and cognitive flexibility as a moderator influencing interdisciplinary learning self-efficacy. Survey data were collected from 820 university students. Structural modeling results revealed that perceived interactivity positively predicted interdisciplinary learning self-efficacy, partially through AI metacognition. Moreover, cognitive flexibility negatively moderated the link between interactivity and negative AI metacognition while strengthening the positive effects of interactivity on both positive AI metacognition and learning self-efficacy. Gender differences further indicated that females benefited more strongly from interactive AI experiences than males. These findings extend the theoretical integration of HCI and metacognitive frameworks within AI-enhanced learning environments and provide practical implications for designing interactive learning systems that promote adaptive metacognitive regulation and cognitive flexibility training.
Virtual surgical planning can support fracture reduction, but its routine use remains limited by the time and interaction burden associated with manually manipulating fragments in six degrees of freedom. This work presents a CBCT-based workflow designed to minimise user interaction by replacing continuous manual manipulation with a small number of point-based initialisation steps. Fragment alignment through a sequential rigid registration strategy based on the iterative closest point (ICP) algorithm. The largest fragment is initialised manually. Subsequent fragments inherit the transformation estimated in the previous step and are refined automatically through ICP, although additional initialisation may be required in unfavourable configurations. Reduction accuracy was assessed using surface-to-surface distance analysis. The workflow is implemented using fully open-source tools ensuring reproducibility and accessibility without proprietary dependencies. The workflow was evaluated on experimentally induced bovine femur fractures comprising two to four fragments by users with heterogeneous backgrounds. Compared to manual alignment, the proposed approach significantly reduced reduction time (Wilcoxon signed-rank paired test, p = 0.022), corresponding to an average reduction of approximately 63%. Surface-to-surface analysis showed mean alignment errors of approximately 1.5-1.8 mm, with consistent results across users. A usability-driven workflow that constrains interaction to a limited number of discrete steps can substantially improve the efficiency of virtual fracture reduction while maintaining user control. The open-source pipeline provides an accessible solution for preoperative fracture reduction planning and should be regarded as a foundation for further developments in usability-driven virtual fracture reduction.
Artificial intelligence (AI) systems now challenge or surpass human experts in many computer games1,2. Physical and real-time sports such as table tennis, however, remain a major open challenge because of their requirements for fast, precise and adversarial interactions near obstacles and at the edge of human reaction time3. Here we present Ace, to our knowledge the first real-world autonomous system competitive with elite human table tennis players. Ace addresses the challenges of physical real-time interaction through a new, high-speed perception system using event-based vision sensors4, and a new control system based on model-free reinforcement learning, as well as state-of-the-art high-speed robot hardware. Evaluated in matches against elite and professional players under official competition rules, Ace achieved several victories and demonstrated consistent returns of high-speed, high-spin shots. These results highlight the potential of physical AI agents to perform complex, real-time interactive tasks, suggesting broader applications in domains requiring fast, precise human-robot interaction.
Gastric cancer is marked by profound molecular and microenvironmental heterogeneity that limits therapeutic progress. Here, we present a 15-layer multi-omics atlas that integrates genomics, epigenomics, transcriptomics, proteomics, multiple post-translational modifications (PTMs), protein-protein interactions, metabolomics, and microbiome profiles from 159 primary gastric adenocarcinomas and 30 matched normal adjacent tissues. Using cell-state deconvolution, we define tumor ecotypes that refine genomic and histological subtypes by capturing distinct tumor microenvironment architectures linked to clinical outcomes and potential associations with immunotherapy response. Multi-omics integration prioritizes genomic and epigenomic aberrations and their associated vulnerabilities; defines ecotype-specific transcriptional programs, signaling pathways, PTMs, protein interaction networks, and metabolic regulation; and identifies microbiome features linked to ecotypes and resistance pathways. We further prioritize ecotype-, genomic subtype-, and cell type-specific targetable proteins using proteomic and PTM analyses within a tumor microenvironment context. This comprehensive atlas provides a systems-level blueprint for decoding gastric cancer heterogeneity and advancing precision oncology.
Conversational AI offers scalable mental health support, with large language models (LLMs) enabling personalized interactions. Human-centered design is critical in this domain, yet a comprehensive synthesis from this perspective is lacking. This review maps conversational AI research in mental health across the patient journey and develops a human-centered taxonomy to guide future design. Following PRISMA guidelines, we conducted a comprehensive search across fifteen multidisciplinary databases. We systematically analyzed the literature across six dimensions: research foci, mental disorder types, target populations, AI technologies, data sources, and evaluation metrics. A consensual taxonomy research method was employed to develop a human-centered design framework. Of 10,293 identified records, 677 studies met the inclusion criteria. Analysis reveals a marked increase in publications since 2020, predominantly from computer science (449 studies), followed by medicine (148) and social sciences (80). Research is skewed toward detection (23%) and intervention (66%) stages, with prevention (8%) and maintenance (3%) receiving less attention. Mood, anxiety, and stress-related disorders are the most investigated conditions. LLMs have emerged as the predominant AI technology, particularly within intervention and maintenance stages. Data sources continue to rely heavily on text-based inputs, with multimodal approaches still limited in adoption. Evaluation metrics vary significantly by discipline, reflecting limited cross-disciplinary integration. Through thematic synthesis, we developed a human-centered taxonomy comprising four primary dimensions: Emotional Sensitivity to Users, User-Centric Interaction Design, Human-AI Collaboration and Capability Enhancement, and Ethics and Accountability, with a total of thirteen sub-dimensions. This review provides a comprehensive, human-centered mapping of conversational AI research in mental health across the patient journey. Critical gaps remain in stage coverage, disorder diversity, population inclusivity, multimodal data integration, and interdisciplinary evaluation. The proposed taxonomy offers a structured framework to align AI development with human-centered principles, fostering empathetic, ethical, effective, and equitable mental health support.
Animal behavior recognition is an important research area that provides insights into areas such as neural functions, gene mutations, and drug efficacy, among others. The manual coding of behaviors based on video recordings is labor-intensive and prone to inconsistencies and human error. Machine learning approaches have been used to automate the analysis of animal behavior with promising results. Our work builds on existing developments in animal behavior analysis and state-of-the-art approaches in computer vision to identify rodent social behaviors. Specifically, our proposed approach, called Vision Transformer for Rat Social Interactions (ViT-RSI), leverages the existing Global Context Vision Transformer (GC-ViT) architecture to identify rat social interactions. Experimental results using five behaviors of the publicly available Rat Social Interaction (RatSI) dataset show that the ViT-RatSI approach can accurately identify rat social interaction behaviors. When compared with prior results from the literature, the ViT-RatSI approach achieves best results for four out of five behaviors, specifically for the "Approaching", "Following", "Moving away", and "Solitary" behaviors, with F1 scores of 0.81, 0.81, 0.86, and 0.94, respectively.