Large language models (LLMs) are increasingly utilized in clinical reasoning and risk assessment. However, their interpretive reliability in critical and indeterminate domains such as psychiatry remains unclear. Prior work has identified algorithmic biases and prompt sensitivity in these systems, raising concerns about how contextual information may influence model outputs, but there remains no systematic way to assess these, especially in the psychiatric domain. We propose an approach for reliability auditing downstream LLM tasks by structuring evaluation around the impact of prompt design and the inclusion of medically insignificant inputs on predicted hospitalization risk scores, which is often the first downstream AI clinical-decision-making task. In our audit, a cohort of synthetic patient profiles (n = 50) is generated, each consisting of 15 clinically relevant features and up to 50 clinically insignificant features, across four prompt reframings (neutral, logical, human impact, clinical judgment). We audit four LLMs (Gemini 2.5 Flash, LLaMa 3.3 70b, Claude Sonnet 4.6, GPT-4o mini), and our results show that including medically insignificant variables resulted in a statistica
With the growing interest in using AI and machine learning (ML) in medicine, there is an increasing number of literature covering the application and ethics of using AI and ML in areas of medicine such as clinical psychiatry. The problem is that there is little literature covering the economic aspects associated with using ML in clinical psychiatry. This study addresses this gap by specifically studying the economic implications of using ML in clinical psychiatry. In this paper, we evaluate the economic implications of using ML in clinical psychiatry through using three problem-oriented case studies, literature on economics, socioeconomic and medical AI, and two types of health economic evaluations. In addition, we provide details on fairness, legal, ethics and other considerations for ML in clinical psychiatry.
Large language models (LLMs) offer significant potential in enhancing psychiatric practice, from improving diagnostic accuracy to streamlining clinical documentation and therapeutic support. However, existing evaluation resources heavily rely on small clinical interview corpora, social media posts, or synthetic dialogues, which limits their clinical validity and fails to capture the full complexity of diagnostic reasoning. In this work, we introduce PsychiatryBench, a rigorously curated benchmark grounded exclusively in authoritative, expert-validated psychiatric textbooks and casebooks. PsychiatryBench comprises eleven distinct question-answering tasks ranging from diagnostic reasoning and treatment planning to longitudinal follow-up, management planning, clinical approach, sequential case analysis, and multiple-choice/extended matching formats totaling 5,188 expert-annotated items. {\color{red}We evaluate a diverse set of frontier LLMs (including Google Gemini, DeepSeek, Sonnet 4.5, and GPT 5) alongside leading open-source medical models such as MedGemma using both conventional metrics and an "LLM-as-judge" similarity scoring framework. Our results reveal substantial gaps in clin
Computational psychiatry faces a fundamental trade-off: traditional reinforcement learning (RL) models offer interpretability but lack behavioral realism, while large language model (LLM) agents generate realistic behaviors but lack structural interpretability. We introduce BioLLMAgent, a novel hybrid framework that combines validated cognitive models with the generative capabilities of LLMs. The framework comprises three core components: (i) an Internal RL Engine for experience-driven value learning; (ii) an External LLM Shell for high-level cognitive strategies and therapeutic interventions; and (iii) a Decision Fusion Mechanism for integrating components via weighted utility. Comprehensive experiments on the Iowa Gambling Task (IGT) across six clinical and healthy datasets demonstrate that BioLLMAgent accurately reproduces human behavioral patterns while maintaining excellent parameter identifiability (correlations $>0.67$). Furthermore, the framework successfully simulates cognitive behavioral therapy (CBT) principles and reveals, through multi-agent dynamics, that community-wide educational interventions may outperform individual treatments. Validated across reward-punishme
Background: Symptom rating scales in psychiatry are limited by reliance on self-report, and lack of predictive power. Actigraphy, a passive wearable-based method for measuring sleep and physical activity, offers objective, high-resolution behavioral data that may better reflect symptom fluctuations, but most studies have focused on narrow diagnostic groups or fixed time windows, limiting clinical translation. Objective: To examine whether actigraphy-derived sleep and activity features correlate with psychiatric symptom severity in a transdiagnostic psychiatric sample, and to identify which features are most clinically relevant across multiple temporal resolutions. Methods: We present a feasibility case series study with preliminary data from eight outpatients enrolled in the DeeP-DD study, a transdiagnostic study of digital phenotyping. Participants wore GENEActiv actigraphy devices and symptom severity was measured using a variety of validated scales. We performed intra-individual Spearman correlations and inter-individual repeated measures correlations across daily, weekly, monthly, and full-duration averages. Results: Intra-individual analyses revealed that later rise times were
Large language models (LLMs) are increasingly proposed as scalable solutions to the global mental health crisis. But their deployment in psychiatric contexts raises a distinctive ethical concern: the problem of atypicality. Because LLMs generate outputs based on population-level statistical regularities, their responses -- while typically appropriate for general users -- may be dangerously inappropriate when interpreted by psychiatric patients, who often exhibit atypical cognitive or interpretive patterns. We argue that standard mitigation strategies, such as prompt engineering or fine-tuning, are insufficient to resolve this structural risk. Instead, we propose dynamic contextual certification (DCC): a staged, reversible and context-sensitive framework for deploying LLMs in psychiatry, inspired by clinical translation and dynamic safety models from artificial intelligence governance. DCC reframes chatbot deployment as an ongoing epistemic and ethical process that prioritises interpretive safety over static performance benchmarks. Atypicality, we argue, cannot be eliminated -- but it can, and must, be proactively managed.
The application of AI in psychiatric diagnosis faces significant challenges, including the subjective nature of mental health assessments, symptom overlap across disorders, and privacy constraints limiting data availability. To address these issues, we present MoodAngels, the first specialized multi-agent framework for mood disorder diagnosis. Our approach combines granular-scale analysis of clinical assessments with a structured verification process, enabling more accurate interpretation of complex psychiatric data. Complementing this framework, we introduce MoodSyn, an open-source dataset of 1,173 synthetic psychiatric cases that preserves clinical validity while ensuring patient privacy. Experimental results demonstrate that MoodAngels outperforms conventional methods, with our baseline agent achieving 12.3% higher accuracy than GPT-4o on real-world cases, and our full multi-agent system delivering further improvements. Evaluation in the MoodSyn dataset demonstrates exceptional fidelity, accurately reproducing both the core statistical patterns and complex relationships present in the original data while maintaining strong utility for machine learning applications. Together, the
The classification of clinical notes into specific diagnostic categories is critical in healthcare, especially for mental health conditions like Anxiety and Adjustment Disorder. In this study, we compare the performance of various Artificial Intelligence models, including both traditional Machine Learning approaches (Random Forest, Support Vector Machine, K-nearest neighbors, Decision Tree, and eXtreme Gradient Boost) and Deep Learning models (DistilBERT and SciBERT), to classify clinical notes into these two diagnoses. Additionally, we implemented three oversampling strategies: No Oversampling, Random Oversampling, and Synthetic Minority Oversampling Technique (SMOTE), to assess their impact on model performance. Hyperparameter tuning was also applied to optimize model accuracy. Our results indicate that oversampling techniques had minimal impact on model performance overall. The only exception was SMOTE, which showed a positive effect specifically with BERT-based models. However, hyperparameter optimization significantly improved accuracy across the models, enhancing their ability to generalize and perform on the dataset. The Decision Tree and eXtreme Gradient Boost models achiev
Forensic mental health care involves the treatment of individuals with severe mental disorders who have committed violent offences. These settings are often characterized by high levels of bureaucracy, risk avoidance, and restricted autonomy. Patients frequently experience a profound loss of control over their lives, leading to heightened psychological stress-sometimes resulting in isolation as a safety measure. In this study, we explore how co-design can be used to collaboratively develop a companion robot that helps monitor and regulate stress while maintaining tracking of the patients' interaction behaviours for long-term intervention. We conducted four co-design workshops in a forensic psychiatric clinic with patients, caregivers, and therapists. Our process began with the presentation of an initial speculative prototype to therapists, enabling reflection on shared concerns, ethical risks, and desirable features. This was followed by a creative ideation session with patients, a third workshop focused on defining desired functions and emotional responses, and we are planning a final prototype demo to gather direct patient feedback. Our findings emphasize the importance of empowe
Mental disorders have become a significant global public health issue, while the shortage of psychiatrists and inefficient training systems severely hinder the accessibility of mental health services. This paper designs and implements an artificial intelligence-based training system for psychiatrists. By integrating technologies such as large language models, knowledge graphs, and expert systems, the system constructs an intelligent and standardized training platform. It includes six functional modules: case generation, consultation dialogue, examination prescription, diagnostic decision-making, integrated traditional Chinese and Western medicine prescription, and expert evaluation, providing comprehensive support from clinical skill training to professional level assessment.The system adopts a B/S architecture, developed using the Vue.js and Node.js technology stack, and innovatively applies deep learning algorithms for case generation and doctor-patient dialogue. In a clinical trial involving 60 psychiatrists at different levels, the system demonstrated excellent performance and training outcomes: system stability reached 99.95%, AI dialogue accuracy achieved 96.5%, diagnostic ac
Studying psychiatric illness has often been limited by difficulties in connecting symptoms and behavior to neurobiology. Computational psychiatry approaches promise to bridge this gap by providing formal accounts of the latent information processing changes that underlie the development and maintenance of psychiatric phenomena. Models based on these theories generate individual-level parameter estimates which can then be tested for relationships to neurobiology. In this review, we explore computational modelling approaches to one key aspect of health and illness: affect. We discuss strengths and limitations of key approaches to modelling affect, with a focus on reinforcement learning, active inference, the hierarchical gaussian filter, and drift-diffusion models. We find that, in this literature, affect is an important source of modulation in decision making, and has a bidirectional influence on how individuals infer both internal and external states. Highlighting the potential role of affect in information processing changes underlying symptom development, we extend an existing model of psychosis, where affective changes are influenced by increasing cortical noise and consequent i
Mobile technology (e.g., mobile phones and wearable devices) provides scalable methods for collecting physiological and behavioral biomarkers in patients' naturalistic settings, as well as opportunities for therapeutic advancements and scientific discoveries regarding the etiology of psychiatric illness. Continuous data collection through mobile devices generates highly complex data: entangled multivariate time series of outcomes, exposures, and covariates. Missing data is a pervasive problem in biomedical and social science research, and Ecological Momentary Assessment (EMA) data in psychiatric research is no exception. However, the complex data structure of multivariate time series and their non-stationary nature make missing data a major challenge for proper inference. Additional historical information included in time series analyses exacerbates the issue of missing data and also introduces problems for confounding adjustment. The majority of existing imputation methods are either designed for stationary time series or for longitudinal data with limited follow-up periods. The limited work on non-stationary time series either focuses on missing exogenous information or ignores t
Large language models (LLMs) hold transformative potential for medical decision support yet their application in psychiatry remains constrained by hallucinations and superficial reasoning. This limitation is particularly acute in light-parameter LLMs which are essential for privacy-preserving and efficient clinical deployment. Existing training paradigms prioritize linguistic fluency over structured clinical logic and result in a fundamental misalignment with professional diagnostic cognition. Here we introduce ClinMPO, a reinforcement learning framework designed to align the internal reasoning of LLMs with professional psychiatric practice. The framework employs a specialized reward model trained independently on a dataset derived from 4,474 psychiatry journal articles and structured according to evidence-based medicine principles. We evaluated ClinMPO on a unseen subset of the benchmark designed to isolate reasoning capabilities from rote memorization. This test set comprises items where leading large-parameter LLMs consistently fail. We compared the ClinMPO-aligned light LLM performance against a cohort of 300 medical students. The ClinMPO-tuned Qwen3-8B model achieved a diagnos
Artificial Intelligence (AI) has revolutionized various fields, including medicine and mental health support. One promising application is ChatGPT, an advanced conversational AI model that uses deep learning techniques to provide human-like responses. This review paper explores the potential impact of ChatGPT in psychiatry and its various applications, highlighting its role in therapy and counseling techniques, self-help and coping strategies, mindfulness and relaxation techniques, screening and monitoring, education and information dissemination, specialized support, group and family support, learning and training, expressive and artistic therapies, telepsychiatry and online support, and crisis management and prevention. While ChatGPT offers personalized, accessible, and scalable support, it is essential to emphasize that it should not replace the expertise and guidance of qualified mental health professionals. Ethical considerations, such as user privacy, data security, and human oversight, are also discussed. By examining the potential and challenges, this paper sheds light on the responsible integration of ChatGPT in psychiatric research and practice, fostering improved mental
As the emerging field of predictive analytics in psychiatry generated and continues to generate massive interest overtime with its major promises to positively change and revolutionize clinical psychiatry, health care and medical professionals are greatly looking forward to its integration and application into psychiatry. However, by directly applying predictive analytics to the practice of psychiatry, this could cause detrimental damage to those that use predictive analytics through creating or worsening existing medical issues. In both cases, medical ethics issues arise, and need to be addressed. This paper will use literature to provide descriptions of selected stages in the treatment of mental disorders and phases in a predictive analytics project, approach mental disorder diagnoses using predictive models that rely on neural networks, analyze the complexities in clinical psychiatry, neural networks and predictive analytics, and conclude with emphasizing and elaborating on limitations and medical ethics issues of applying neural networks and predictive analytics to clinical psychiatry.
Precision psychiatry is an ermerging field that aims to provide individualized approaches to mental health care. Multivariate analysis and machine learning are used to create outcome prediction models based on clinical data such as demographics, symptom assessments, genetic information, and brain imaging. While much emphasis has been placed on technical innovation, the complex and varied nature of mental health presents significant challenges to the successful implementation of these models. From this perspective, I review ten challenges in the field of precision psychiatry, including the need for studies on real-world populations and realistic clinical outcome definitions, consideration of treatment-related factors such as placebo effects and non-adherence to prescriptions. Fairness, prospective validation in comparison to current practice and implementation studies of prediction models are other key issues that are currently understudied. A shift is proposed from retrospective studies based on linear and static concepts of disease towards prospective research that considers the importance of contextual factors and the dynamic and complex nature of mental health.
Recent reports indicate that sustained interaction with conversational artificial intelligence (AI) systems can, in a small subset of users, contribute to the emergence or stabilisation of delusional experience. Existing accounts typically attribute such cases either to individual vulnerability or to failures of safety engineering. These explanations are incomplete. Drawing on phenomenology, psychiatry, and cognitive neuroscience, this paper argues that the risk arises from the relational and ontological structure of the interaction itself. Conversational AI generates ontological dissonance: a conflict between the appearance of relational presence and the absence of any subject capable of sustaining it. Maintained through a communicative double bind and amplified by attentional asymmetries, this dissonance tends, under conditions of affective vulnerability, to stabilise into a technologically mediated analogue of folie a deux. This account explains why explicit disclaimers often fail to disrupt delusional involvement and clarifies the ethical and clinical implications for the design and use of conversational AI.
The great behavioral heterogeneity observed between individuals with the same psychiatric disorder and even within one individual over time complicates both clinical practice and biomedical research. However, modern technologies are an exciting opportunity to improve behavioral characterization. Existing psychiatry methods that are qualitative or unscalable, such as patient surveys or clinical interviews, can now be collected at a greater capacity and analyzed to produce new quantitative measures. Furthermore, recent capabilities for continuous collection of passive sensor streams, such as phone GPS or smartwatch accelerometer, open avenues of novel questioning that were previously entirely unrealistic. Their temporally dense nature enables a cohesive study of real-time neural and behavioral signals. To develop comprehensive neurobiological models of psychiatric disease, it will be critical to first develop strong methods for behavioral quantification. There is huge potential in what can theoretically be captured by current technologies, but this in itself presents a large computational challenge -- one that will necessitate new data processing tools, new machine learning technique
Computational psychiatry is a field aimed at developing formal models of information processing in the human brain, and how alterations in this processing can lead to clinical phenomena. Despite significant progress in the development of tasks and how to model them, computational psychiatry methodologies have yet to be incorporated into large-scale research projects or into clinical practice. In this viewpoint, we explore some of the barriers to incorporation of computational psychiatry tasks and models into wider mainstream research directions. These barriers include the time required for participants to complete tasks, test-retest reliability, limited ecological validity, as well as practical concerns, such as lack of computational expertise and the expense and large sample sizes traditionally required to validate tasks and models. We then discuss solutions, such as the redesigning of tasks with a view toward feasibility, and the integration of tasks into more ecologically valid and standardized game platforms that can be more easily disseminated. Finally, we provide an example of how one task, the conditioned hallucinations task, might be translated into such a game. It is our h
In light of the NIMH's Research Domain Criteria (RDoC), the advent of functional neuroimaging, novel technologies and methods provide new opportunities to develop precise and personalized prognosis and diagnosis of mental disorders. Machine learning (ML) and artificial intelligence (AI) technologies are playing an increasingly critical role in the new era of precision psychiatry. Combining ML/AI with neuromodulation technologies can potentially provide explainable solutions in clinical practice and effective therapeutic treatment. Advanced wearable and mobile technologies also call for the new role of ML/AI for digital phenotyping in mobile mental health. In this review, we provide a comprehensive review of the ML methodologies and applications by combining neuroimaging, neuromodulation, and advanced mobile technologies in psychiatry practice. Additionally, we review the role of ML in molecular phenotyping and cross-species biomarker identification in precision psychiatry. We further discuss explainable AI (XAI) and causality testing in a closed-human-in-the-loop manner, and highlight the ML potential in multimedia information extraction and multimodal data fusion. Finally, we disc