Understanding how early mother-child interactions are linked to children's social-cognitive processes requires methods capable of capturing the temporal structure of naturalistic behavior. This study introduces a computational framework based on Bayesian Network modeling to identify sequential dependencies among nonverbal behaviors (smiles, gaze, and social touch) exchanged during free play in mother-child dyads (n = 38; age 3 years). From each network, we derived the Order of Sequential Interaction (OSI), a compact index of interaction complexity. We then examined its associations with behavioral, physiological, and neural measures relevant to cognitive development. Although OSI was not associated with language or executive-function scores, analyses revealed links between OSI and prosocial behavior, facial EMG, and neural responses (rTPJ, lIFG) during prosocial-scene viewing. These findings suggest that OSI may capture aspects of interaction structure specifically connected to children's social and affective responsiveness. Building on this, the present framework demonstrates how probabilistic graphical models can structure complex interaction data and support future investigations into multimodal processes in early social cognition. SUMMARY: A Bayesian-network framework is proposed to model multivariate sequential dependencies in naturalistic mother-child interaction. The order of sequential interaction (OSI) quantifies interaction complexity from behavioral time-series data. Higher OSI is associated with greater prosocial behavior and with neural (rTPJ, lIFG) and physiological (facial EMG) responses during social processing. Interaction complexity is not associated with general cognitive or language measures, suggesting that it reflects a distinct dimension of social behavior. The proposed framework provides a basis for studying social-cognitive development from naturalistic interaction data.
The identification of lncRNA-miRNA interactions (LMIs) is crucial for deciphering post-transcriptional regulatory networks and their roles in development and disease. While computational methods have been developed to predict LMIs, existing approaches are often limited by an inability to effectively integrate multimodal biological data and to handle the severe class imbalance inherent to biological networks. To overcome these limitations, we present LMI-MHGAT, a novel deep learning framework for LMI prediction based on a Multilayer Heterogeneous Graph Attention network. Our model integrates diverse data-including RNA sequences, expression profiles, and known molecular interactions-into a unified graph representation. A key innovation is the use of a graph attention mechanism that dynamically learns to weight information from different relational layers, enabling the model to learn robust embeddings for lncRNAs and miRNAs. LMI-MHGAT significantly outperforms 14 existing methods on human LMI data, demonstrating exceptional robustness under severe class imbalance (positive-to-negative ratio 1:60). The model generalizes effectively, achieving state-of-the-art performance on rat and plant datasets. Case studies confirm its ability to recover disease-associated regulatory axes and predict novel, biologically plausible interactions. LMI-MHGAT provides a more powerful and robust framework for LMI prediction by simultaneously addressing key limitations in data utilization and integration. The tool is freely accessible at https://github.com/Zhenpm/LMI-MHGAT.
Deciphering how genes interact within human cells is essential for understanding their functional wiring and for developing targeted therapeutic strategies. In this study, we present a genome-scale map of genetic interactions in the human haploid cell line HAP1, based on CRISPR-based perturbation of ∼4 million gene pairs. The resulting network comprises ∼89,000 high-confidence gene-gene interactions, organizing genes into hierarchical modules corresponding to protein complexes and pathways, biological processes, and cellular compartments, mirroring principles observed in yeast and highlighting the functional architecture of a human cell. This large-scale genetic network complements the DepMap gene co-essentiality network by capturing unique functional information, uncovering roles of previously uncharacterized genes, and identifying molecular determinants of cancer-cell-line-specific genetic dependencies. This study presents a general data-driven strategy for systematically exploring the roles of genes and their functional connections in human cell lines.
Follicle-stimulating hormone (FSH) is a glycoprotein involved in oogenesis and subsequent maturation. An inadequate level of this hormone critically disrupts pre-vitellogenic oocyte progression. This disruption is a major bottleneck in the captive maturation of Asian catfish, Clarias magur. The current study investigates the potential biological molecular interaction of recombinant human follicle-stimulating hormone (r-hFSH) and FSHR of Clarias magur through comprehensive in silico analysis. Subsequent in vivo research was conducted on the progression of pre-vitellogenic oocytes in Clarias magur by r-hFSH induction and shower simulation consisting of three treatments and one control, viz. C: Control, S: Shower, H: Hormone and SH: Shower and hormone. The in silico evaluation revealed that r-hFSH had a better interaction with the C. magur FSH receptor (FSHR) than native FSHβ. These predicted affinity values confirm the trend observed in our docking scores. They provide quantitative support for the conclusion that the recombinant human ligand forms a highly stable complex with the C. magur receptor. In a subsequent 60-day in vivo experiment, weekly r-hFSH administration significantly elevated serum biomarker. Hormone stimulation, with and without shower, effectively advanced oocytes from the pre-vitellogenic stage to spawning. It showed superior fecundity, fertilization, and hatching rates. This demonstrates its potential as a reliable strategy to induce oocyte development and improve captive breeding success in C. magur. The current finding emphasizes a novel opportunity for regulating oogenesis and the advancement of pre-vitellogenic oocyte development to maturation in captive farmed C. magur, utilizing solely r-hFSH.
Operational robots have demonstrated significant potential in complex scenarios such as live-line maintenance and medical surgery. Existing research on Mixed Reality (MR) and Digital Twin (DT) systems has primarily focused on unidirectional data visualization and passive state monitoring. Existing research on Mixed Reality (MR) and Digital Twin (DT) systems has primarily focused on unidirectional data visualization and passive state monitoring, acting as "open-loop" observation tools that fail to address low operational precision and inefficient human-robot synergy in dynamic, high-risk environments. For the first time, we integrate an MR-based closed-loop digital twin operating system for human-robot collaborative operation into the task execution of live-line operation equipment to address the above challenges. Moving beyond simple visualization, the proposed framework establishes an integrated operational paradigm that bridges the gap between immersive perception and real-time interventional control. This framework comprises three integral components: (1) the construction of a high-fidelity virtual digital twin; (2) the development of a human-computer interaction paradigm based on MR technology; and (3) the establishment of an MR-based human-machine collaborative operation mode. Building upon this framework, a system was implemented for live-line working robots. Experimental results indicate that, compared with traditional control methods, the proposed system reduces the task completion time of live-line equipment tasks by 14.3% on average, verifying the feasibility and effectiveness of the pioneering application of the closed-loop digital twin operating system in live-line operation equipment.
暂无摘要(点击查看详情)
Artificial sensing systems have broad application potential in areas such as health monitoring, human-computer interaction, and rehabilitation medicine. However, most existing systems are limited to one-way acquisition and transmission of electrical signals and lack intuitive, real-time feedback for interactive use. This unidirectional operation limits the availability of direct, human-interpretable output cues, thereby restricting their effectiveness in scenarios that require real-time guidance and dynamic interaction, such as rehabilitation training and interactive learning. Introducing a feedback mechanism can effectively overcome this limitation by providing intuitive visual output and enabling a more interactive "perception-feedback-adjustment" pathway, which may improve both the efficiency and precision of human-machine interaction. To address this challenge, we developed a novel artificial sensing system that integrates highly sensitive motion detection with real-time multicolor optical feedback. The stretchable triboelectric nanogenerator (TENG) used as a self-powered motion sensor exhibited sensitivities of 0.145 kPa-1 in the low-pressure region (<8 kPa) and 0.019 kPa-1 in the high-pressure region (8-30 kPa). The proposed artificial sensing system, integrating the TENG with a quantum dot light-emitting diode (QLED)-based synaptic device, achieved an overall motion-state recognition accuracy of 98.12%. Compared with conventional electrical feedback, optical feedback in the form of directly observable visual output provides intuitive visualization, strong resistance to electromagnetic interference, and the ability to support multichannel parallel information transmission, making it particularly suitable for delivering clear and unambiguous status indications in complex environments. The synergistic integration of TENG-based mechanical perception and QLED-based optoelectronic feedback demonstrated in this work offers a promising design paradigm for constructing simple, efficient, and intuitive artificial sensory systems.
Accurate identification of cognitive styles is important for personalized learning environment optimization and human-computer interaction system design. Traditional self-report measures suffer from subjectivity bias, so this study developed a machine learning classification model based on objective physiological data. Focusing on the distinction between verbal and representational cognitive styles, the study collected eye-movement data from 85 participants in a standardized cognitive task via eye-tracking technology. We extracted multidimensional eye-movement features and systematically evaluated the classification performance of six machine learning algorithms: decision tree (DT), K-nearest neighbor algorithm (KNN), plain Bayes (NB), support vector machine (SVM), logistic regression (LR), and integrated learning model (EL). Experimental results show that all algorithms can effectively utilize eye movement features for cognitive style classification, with SVM performing optimally, after optimizing the parameters using the grid optimization method, achieving 82.1% classification accuracy (F1 = 0.715). The method proposed in this study provides a new way for non-invasive assessment of cognitive styles, which can be applied to real-time adaptive learning systems. The research results provide important insights into the development of personalization of educational technology, adaptive design of learning interfaces, and cognitive-perceptual computing systems, and provide valuable references for the fields of educational psychology and human-computer interaction research.
Emotion representation is a critical aspect of artificial intelligence, particularly in human-computer interaction and affective computing. Emotion recognition from multi-modal data remains challenging due to the complex semantic relationships between textual, audio, and visual features. This study proposes a hybrid model combining Enhanced Graph Attention Networks and Bidirectional Long Short-Term Memory to address this challenge. First, E-GAT captures structural dependencies between emotional features by constructing a semantic graph from text embeddings. Second, Bi-LSTM models temporal dynamics of sequential data, enabling effective integration of contextual information. We evaluated the model on three benchmark datasets: SemEval-2018 (text-only), RAVDESS (audio-visual), and CMU-MOSEI (multi-modal). Experimental results show that the proposed model achieves state-of-the-art performance: 58.5% accuracy and 68.7% F1-score on SemEval-2018, outperforming baseline models. On multi-modal datasets, it achieves 78.9% accuracy (RAVDESS) and 82.3% accuracy (CMU-MOSEI), demonstrating robust cross-modal generalization. This work advances emotion recognition by providing a unified framework for both text-only and multi-modal scenarios, with applications in human-computer interaction and mental health monitoring.
Accurate alignment of real-world object poses with their virtual counterparts using sensors, e.g. cameras, is essential for consistent interaction in mixed-reality systems. However, objects can undergo abrupt, untracked movements during periods when a tracking system is inactive, e.g., overnight, causing stored pose records to become inconsistent with the real scene and breaking user interaction in the virtual environment. Off-the-shelf 3D reconstruction networks such as MASt3R (Matching and Stereo 3D Reconstruction) method provide metrically scaled 3D point maps and pixel correspondences, but they are trained on static scenes and therefore fail to produce reliable object correspondences when the object has moved. We propose a robust pipeline that combines MASt3R's metrically scaled 3D outputs with a background-based alignment strategy to recover and apply the true pose change of moved objects. Our method first segments foreground and background and extracts 3D background point sets for a reference day and a current day. An affine transformation between these background point sets is estimated via a standard registration technique and used to express the current-day object 3D coordinates in the reference coordinate frame. Within that unified frame we compute the object pose change and apply the resulting transform to the virtual object, restoring real-virtual consistency. Experiments on real scenes demonstrate that the proposed approach reliably corrects pose misalignments introduced during inactive periods and substantially improves over applying MASt3R alone, thereby enabling restored and consistent user interaction in the virtual environment.
Conversational agents (CAs) are increasingly used in mental health care to enhance access and engagement. However, their safe, ethical, and user-sensitive design remains a challenge. Despite growing attention to trauma-informed approaches in human-computer interaction, there is limited work on how the trauma-informed care (TIC) framework could be applied in the design of mental health CAs and no comprehensive synthesis to date. Guided by the Substance Abuse and Mental Health Services Administration's TIC framework, this scoping review explored how TIC principles (safety; trustworthiness and transparency; collaboration and mutuality; empowerment, voice, and choice; peer support; and cultural, historical, and gender issues) are currently represented in the design and evaluation of mental health conversational agents (MHCAs) and identified gaps and opportunities to promote more trauma-informed design practices. Online databases, as well as a secondary survey of citation lists from an initial search, were used to identify English-language journal articles and conference proceedings from 2000 to 2024 that empirically evaluated an independent, web- or app-based, unassisted CA used for mental health and included concepts from TIC. Our analysis included 38 publications (n=28, 73.7%, published in 2020 or later) covering 28 distinct MHCAs. Most studies used experimental methods (n=23, 60.6%) or user studies (n=11, 28.9%), with samples skewed toward female (men: mean 34.92%, SD 18.64%), young in age (mean 32.52, SD 14.6 y), and predominantly nonclinical (n=29, 76.3%). MHCAs were largely rule-based prototypes. No studies explicitly referenced the TIC framework as a guiding lens for MHCA design or evaluation. A total of 26 studies referenced terminology from TIC core principles but rarely defined them, while all 38 included language that could be linked to one or more principles. Overall, TIC-related concepts appeared most often within intervention design descriptions, qualitative assessments, or as items embedded in questionnaires evaluating broader constructs. Trustworthiness and transparency, safety, empowerment, voice and choice, and collaboration and mutuality were comparatively well addressed, while peer support and cultural, historical, and gender issues were largely absent. Design recommendations, where present, were relatively broad and emphasized secure, customizable, reliable, human-like, and context-sensitive MHCAs that offered multimodal interaction, goal setting and tracking, and transparency. Studies did not self-identify as using Substance Abuse and Mental Health Services Administration's framework for TIC, making it more difficult to identify its elements. The fragmented terms, disciplines, and metrics used make it difficult to draw more systematic conclusions about the current research landscape related to TIC, but our analysis indicates TIC to be a descriptive and potentially unifying framework and provides a starting point for the explicit trauma-informed MHCA research and design.
Alzheimer's disease (AD) is the leading cause of dementia and imposes a high economic and social burden on healthcare systems. In Brazil, the consistent increase in costs associated with AD hospitalizations, coupled with the absence of curative therapies and population aging, reinforces the need for low-cost, broadly applicable preventive strategies. This study investigated the role of irisin, a myokine induced by physical activity, in the prevention of AD, integrating epidemiological and bioinformatic analyses. Public data on the nutritional status of the Brazilian population in the early 2000s and on AD hospitalizations approximately 20 years later were analyzed, assessing the temporal association using a lagged Spearman correlation. Additionally, genes associated with AD were analyzed through protein-protein interaction networks and functional enrichment. Structural models of irisin and the integrin αV/β5 receptor were employed in molecular docking and molecular dynamics analyses. Historical data indicated a high prevalence of excess weight in the early 2000s (46.7% ± 4.2% of the adult population) and a strong positive correlation with AD hospitalizations two decades later (ρ = 0.88; p = 0.033). Functional analyses revealed enrichment of pathways related to neurodegeneration, neurotrophins, and neuronal plasticity, involving proteins such as BDNF, AKT, ERK1/2, and CREB. Docking and molecular dynamics indicated a stable interaction of irisin with the αV/β5 receptor, suggesting activation of neuroprotective pathways. The findings reinforce physical exercise as a strategic public health tool for the prevention of AD, providing an epidemiological and molecular basis to reduce the future burden of the disease, thereby shifting the focus of public health policy from treatment to prevention.
Spore-forming bacteria such as Bacillus cereus pose a significant public health challenge due to their ability to survive harsh environmental conditions, resist conventional decontamination strategies, and cause recurrent infections in clinical and food-associated environments. This persistence is primarily associated with sporulation, a tightly regulated developmental process in which long-term survival depends on maintaining genome integrity while cellular metabolism remains largely inactive. DNA-binding proteins are therefore central to sporulation, as effective DNA condensation and protection are essential for sporulation progression and cyst wall maturation. Among these, α/β-class small acid-soluble proteins (SASPs), particularly SASP2, bind to the DNA minor groove and stabilize the genome during dormancy. In this study, a structural model of the B. cereus SASP2-DNA complex was constructed and analyzed through an integrated computational approach to identify compounds targeting this interaction. Phytochemicals derived from Garcinia mangostana, Garcinia cowa, Ficus exasperata, and Entada abyssinica, previously reported for antimicrobial activity, were evaluated for their potential to interact with the SASP2-DNA interface, a mechanism not previously explored. Several compounds showed strong binding affinity at the SASP2-DNA minor-groove interface and were predicted to influence key interactions under simulated stress conditions, leading to DNA compaction stability and stress tolerance, which may subsequently affect cyst wall formation and spore viability. Notably, the identification of plant-derived compounds capable of targeting the SASP2-DNA interface represents a novel observation. Overall, these findings provide a promising computational basis for exploring strategies to limit the persistence and transmission of B. cereus infections.
This study investigates the relationship between classroom peer network structures and English learning motivation among secondary-level EFL learners in Saudi Arabia. Using social network analysis (SNA), the research examines (a) gender-based differences in network structural properties, (b) associations between learners' L2 motivation and their network positions, and (c) the extent of motivational homophily within peer networks. Data were collected through a questionnaire survey from 100 students enrolled in summer enrichment programs. Directed peer nomination data were used to construct classroom interaction networks, and network metrics, including density, reciprocity, and centrality, were computed. Results indicate modest structural differences between male and female networks, with males forming slightly denser and more reciprocal networks while females tend to occupy structurally more central positions within the network. However, no statistically significant associations were found between motivation levels and centrality measures, and no evidence of motivation-based homophily was observed. These findings suggest that learners' motivational dispositions were not systematically associated with their structural positioning within peer networks in this sample, indicating that network position alone may not correspond to differential motivational outcomes in classroom settings. Implications for language educators emphasize the potential value of intentionally structured peer interaction practices, rather than assuming that motivationally prominent students will naturally occupy structurally influential positions. The study contributes to understanding the structural and motivational dimensions of EFL learning and offers theoretical insight into the limited alignment between network structure and individual motivational variation, informing future research employing social network approaches in educational settings.
Asthma is a prevalent chronic respiratory condition among children worldwide. Inhalation therapy is the primary treatment method, but children often make errors in its use and exhibit poor adherence, which impacts treatment effectiveness. Therefore, interventions to improve inhalation techniques and enhance adherence are urgently needed. This study aimed to develop and evaluate BreatheBuddy, developed by Haoyu Zhang, a training system incorporating gamified feedback designed to enhance inhalation skills and treatment adherence in children with asthma. This study used a single-factor repeated-measures design and recruited 20 children aged 6 to 8 years (10 boys and 10 girls), all of whom had prior experience with inhalers. The experimental group used the BreatheBuddy system, which combines a physical inhaler with an interactive game-based software. The system provides real-time animated feedback based on data from inhalation, breath-holding, and exhalation to guide the rhythm and depth of inhalation. The control group used a conventional inhaler method without a gamified system. Inhalation accuracy, adherence, and satisfaction were assessed using the respiration sensor, the Player Experience of Need Satisfaction scale, the Game User Experience Satisfaction Scale (GUESS), and the System Usability Scale (SUS) scales. Statistical comparisons between the groups were conducted using paired t tests and Mann-Whitney U tests to analyze differences. The experimental group demonstrated significant improvements in inhalation accuracy, with longer breath-holding times and more stable breathing patterns compared to the control group (P<.001). The experimental group also exhibited significantly higher engagement and motivation, with Player Experience of Need Satisfaction (standardized score=93.83) and GUESS (median 87.92, IQR 86.54-88.46) scores markedly higher than those of the control group. Usability scores for the experimental group were also superior, with an SUS score of 88.96 (P<.001). Additionally, children in the experimental group showed reduced anxiety and improved focus during training. BreatheBuddy effectively optimized children's inhalation skills, boosted treatment adherence, and relieved inhalation-related anxiety. Different from conventional non-gamified training or simple game-based distraction, this study integrated breathing behaviors into core game interaction. With dynamic respiratory rhythm feedback, the system unifies skill training, motivation promotion, and emotional regulation. Combined with standard inhaler operation and immersive gamified interaction, it presents a novel behavior-oriented design paradigm. This work provides empirical evidence for gamified intervention in pediatric respiratory treatment and offers a practical auxiliary tool for clinical daily training to strengthen children's self-management. Further research will focus on personalized adjustment and wider clinical application of the system.
Despite their benefits, digital health tools often face adoption barriers because of the digital divide. Identifying the fundamental user skills required to effectively navigate these tools and the usability barriers is essential to addressing disparities in use. This study aimed to identify the skill and usability barriers to using digital health tools. This study included English-, Spanish-, or Cantonese-speaking patients, aged ≥50 years, who received care at an urban safety net health system in the United States. Participants completed a survey examining sociodemographic characteristics and digital health tool use and were observed and video recorded as they navigated four digital health care tasks: (1) launch a video visit, (2) visit a health website through a URL, (3) log in to the patient portal, and (4) sign up for a patient portal account. Participants who could not independently perform the tasks received additional support. Tasks were conducted in English, while instructions and additional assistance were provided in each participant's preferred language. Video recordings were thematically coded to identify the fundamental skills needed for effective digital tool use and usability barriers in the design of digital tools. We examined whether task independence was associated with participant demographics and thematic categories using Kruskal-Wallis, χ2, and Fisher exact tests. In total, 74% (34/46), 52% (31/60), 71% (44/62), and 70% (43/61) of participants (N=64) independently completed digital tasks 1, 2, 3, and 4, respectively. Older age, minoritized races and ethnicities, non-English language preference, lower educational attainment, access to cellular data only or no internet access, and lack of a portal account were associated with a higher likelihood of requiring assistance or being unsuccessful at completing each task (P<.001, except for older age [P=.004]). The qualitative coding of video recordings identified 3, 4, and 6 categories of typing, navigation, and human-computer interaction (HCI) skills, respectively, as fundamental skills required to independently complete digital tasks. χ2 and Fisher exact tests indicated significant associations between most typing, navigation, and HCI categories and independent task completion. We coded usability barriers as one of 6 learnability challenges or 3 operability challenges. This study identified that independent use of digital health tools requires fundamental typing, navigation, or HCI skills as well as high usability of digital tools. The inclusion of 4 different digital tasks added specificity to the type of skills and usability considerations necessary to ensure accessibility of digital health tools to diverse older adults. This study underscores the need for vendors to cocreate digital health tools with historically excluded end users in mind. As health care systems expand digital tool adoption, they must distinguish fundamental skill gaps from usability barriers, as each may require different intervention strategies.
Background/Objectives: SSVEP-BCI has broad application potential in mobile human-computer interaction due to its high information transfer rate and stable signal characteristics. The introduction of deep learning technology has significantly advanced SSVEP decoding performance, offering novel approaches for processing short-duration signals and tackling complex classification tasks. The establishment of the Tsinghua Benchmark dataset provides a standardized benchmark for evaluating algorithm performance, accelerating the development of deep learning-based SSVEP decoding. However, a summary of SSVEP deep learning decoding technologies for real-time mobile applications is lacking. Methods: We conducted a comprehensive literature review of SSVEP deep learning decoding studies published since 2023, using the Tsinghua Benchmark dataset. This review focuses on technical developments targeting real-time performance, low computational complexity, and high robustness. Results: We summarize the key technologies developed for real-time mobile SSVEP decoding. Our analysis thoroughly examines how these techniques address core challenges in the engineering implementation of mobile brain-computer interfaces, including real-time processing requirements, resource constraints, and environmental robustness. Conclusions: This review provides a comprehensive overview of SSVEP deep learning decoding technologies for mobile applications, establishing a technical foundation to advance mobile brain-computer interfaces from laboratory settings to practical deployment.
"Empathy" is widely discussed in health and care settings and is increasingly claimed as an attribute of artificial intelligence (AI) systems (eg, socially assistive robots and chatbots), but the term is used inconsistently across the literature. In research on AI in these settings, it is often unclear what authors mean by "empathic AI," what systems do that is intended to be empathic, and how empathy is assessed. This matters because perceived empathy can shape users' experience of AI-mediated support and their willingness to engage with these systems. This study aims to map how empathy is defined, operationalized, and evaluated in peer-reviewed AI research in health and care settings and to describe interactional design features commonly reported in systems perceived as more empathic. This protocol outlines a scoping review following Joanna Briggs Institute guidance and is reported in accordance with PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews). We use "AI" as an umbrella term and will extract and classify each system's type (eg, rule-based or large language model-based). We will search PubMed (MEDLINE), Embase, PsycInfo, CINAHL, Scopus, IEEE Xplore, and the ACM Digital Library databases. Two reviewers will screen titles and abstracts using ASReview and full texts by using Rayyan. We will extract study characteristics, empathy definitions and framing, empathy-related system behaviors and design features, and evaluation methods, and synthesize findings thematically. This scoping review forms a part of the first author's doctoral research, funded by an Engineering and Physical Sciences Research Council studentship from October 2025. Pilot searches were conducted on January 20, 2026; full searches and synthesis are planned for 2026, with publication anticipated in 2027. The review will produce (1) a summary of how empathy is defined in AI research in health and care settings, (2) a grouped list of the main empathic interactional behaviors and design features described, and (3) an overview of how empathy is measured across studies. Where studies report empathy ratings, we will summarize which features are most commonly present in higher-rated systems within comparable contexts. The review will provide a clearer picture of what researchers mean by "AI empathy" in health and care settings and what system features are most commonly used when trying to build it. These findings may help guide the development of more empathic AI systems. PRR1-10.2196/93078.
The prediction of Epidermal Growth Factor Receptor (EGFR) mutation status in advanced lung adenocarcinoma is crucial for targeted therapy. Since EGFR mutations manifest as both macroscopic imaging features on CT and microscopic morphological changes in tissue, integrating these multiscale signals is essential for a comprehensive diagnostic assessment. However, current related research faces two key limitations: on one hand, unimodal deep learning models suffer from limited representational power; on the other hand, existing multimodal methods fail to address the inherent data structural discrepancies between continuous CT and discrete WSI, often losing critical fine-grained details due to forced data compression or shared semantic bottlenecks. To address the above limitations and improve the reliability of EGFR mutation status prediction, this study aims to propose a novel multimodal fusion framework (MFCA) that can effectively capture cross-modal semantic interactions and align imaging features across different scales. A novel MFCA based on Cross-Attention (MFCA) is proposed, and its implementation steps are as follows: 1. First, a region-of-interest-guided approach is utilized to coarsely segment whole-slide histopathology images (WSI) into three constituent regions, namely cancerous, stromal, and other regions; 2. Then, a dual-branch encoder is employed to separately extract features from two types of imaging data-global features from Computed Tomography (CT) scans and region-specific features from the segmented WSI; 3. Critically, a bidirectional cross-attention module is introduced into the framework, which is designed to facilitate deep semantic interaction and alignment between the macroscopic context of CT imaging and the microscopic context of histopathology, thereby achieving highly efficient and discriminative feature fusion. On the external validation set, our MFCA framework achieved robust performance, with Area Under the Curve (AUC) values of 0.758(95% CI: 0.683-0.832) for cancerous regions, 0.805(95% CI: 0.716-0.900) for stromal regions, and 0.760(95% CI: 0.686-0.833) for other regions. The model's performance, particularly in the stromal component, was statistically superior to all baseline and competing models. The proposed MFCA framework predicts EGFR mutation status by innovatively integrating macroscopic CT imaging with region-specific microscopic WSI features. It serves as a valuable computational tool to support precision oncology for patients with advanced lung adenocarcinoma.
Speech-visual emotion recognition plays a vital role in human-computer interaction applications. However, it typically confronts several challenges, such as: (1) conventional speech-visual key frame (SVKF) extraction methods are susceptible to redundancy and emotional information loss; (2) widely adopted attention-based speech-visual feature fusion approaches often compute weights with limited interpretability. To address these challenges, this paper proposes an effective two-stage key frame extraction method for speech-visual emotion recognition. Specifically, in the first stage, visual key frames (VKFs) are extracted by employing information entropy (IE) to model the continuous process of emotion generation, thereby decreasing visual frame redundancy. Corresponding speech key frames (SKFs) are obtained simultaneously by eliminating silent segments to reduce redundancy in the speech modality. Subsequently, by leveraging the complementarity characteristics of speech and visual modalities, the first-stage SKFs and VKFs are aligned to produce the ultimate second-stage SVKFs for preserving important emotional information, in which a simple and interpretable weighted fusion is also proposed to focus on processing important emotional information. The experimental results on the RML, eNTERFACE05, MEAD and BAUM-1s datasets demonstrate that the proposed effective two-stage key frame extraction has better inference and generalization performance.