The regulatory environment for digital medical devices has rapidly changed in recent years as policymakers have worked to keep up with the evolving landscape of biomedical technologies. Despite the need for new regulatory efforts, existing data assets in the US are not capable of systematically tracking whether FDA-authorized medical devices have digital components, limiting regulators' ability to incorporate software-specific considerations into post-market surveillance activities and limiting other stakeholders' ability to understand the extent of digitization of different medical specialty areas and product categories. We pioneer a new application of text analysis, using records from tens of thousands of regulatory documents for newly authorized medical devices, to describe the digital transformation of the US medical device industry over the past two decades. We show that the number of medical devices with digital components has grown substantially over time, with meaningful heterogeneity across clinical specialties.
Digital health interventions (DHIs) are increasingly used in dermatology to improve patients' quality of life (QoL). This systematic review (PROSPERO: CRD420250628076) synthesized 14 studies categorized into four WHO intervention types, including telemedicine, targeted client communication, personal health tracking, and client-to-client communication, and evaluated for their effects on QoL and related psychosocial outcomes. Among telemedicine studies (n = 8), six reported non inferior or superior QoL versus in person care. Targeted client communication (n = 4) showed mixed QoL effects but improved mental health and self efficacy. Personal health tracking (n = 1) did not significantly improve QoL, while client to client communication (n = 1) provided psychosocial support. Overall, DHIs may improve QoL in dermatology, but evidence is limited by methodological heterogeneity, risk of bias, and short follow up. Future evaluations should prioritize standardized core outcome sets and integrate AI with in time adaptive interventions to deliver personalized psychological support, ensuring algorithmic equity from the outset.
Mobile apps are commonly used to deliver digital interventions, typically through prompts that encourage engagement with intervention contents. However, engagement with these interventions remains suboptimal. A recent organizing framework conceptualizes digital interventions as a sequence of stimuli/tasks, underscoring the importance of measuring engagement with each stimulus/task separately. We applied this framework to a tobacco cessation intervention study that used an app to deliver prompts for practicing self-regulatory strategies. We found that engagement with certain stimuli/tasks (e.g., messages describing a low-effort strategy) declined over time, whereas engagement with others (e.g., messages recommending a more effortful strategy) remained stable. Initiating a brief in-app survey prior to prompting increased the likelihood of engaging with the intervention prompt, but not with subsequent messages or recommended strategies. These findings support conceptualizing engagement as a dynamic, multistage process and offer practical insights for optimizing engagement with digital interventions.
Digital phenotyping leverages continuous data from smartphones and wearable devices for real-time mental health monitoring, offering opportunities for early detection and personalized care in mood disorders by enabling clinicians to proactively respond to significant changes before symptoms worsen. However, the heterogeneity of such data presents modeling challenges. This study evaluates the potential of large language models (LLMs) to detect changes in depression severity from digital phenotyping data among individuals experiencing major depressive episodes. We compare in-context learning and fine-tuning strategies and find that both few-shot prompted and fine-tuned LLMs outperform traditional baselines. Furthermore, embedding-only and QLoRA fine-tuning yield comparable results, with the former excelling on individual features and the latter performing better on combined inputs. These findings demonstrate the promise of LLMs in integrating heterogeneous behavioral data for mental health analysis, while underscoring the importance of clinical validation and ethical safeguards in real-world applications.
Digital health interventions (DHIs) are increasingly used to strengthen patient engagement. However, despite its rapid growth, DHIs remain unevenly evaluated and poorly standardized. Six databases were searched to generate an evidence gap map of interventions and research gaps for DHIs targeting patient engagement. A total of 160 systematic reviews (including 42 meta-analyses) comprising 3974 primary studies were mapped with most (92%) conducted in high-income countries. Evidence was concentrated around mHealth, eHealth, telehealth, and messaging technologies. Commonly reported outcomes included medication adherence, quality of life, implementation outcomes, and self-management. Overall, 61% of reviews reported positive conclusions, although most were rated as low or critically low methodological quality. Priority areas include strengthening evidence from LMICs, evaluating long-term and emerging DHIs (e.g., wearables and gamified platforms), and improving methodological rigor for systematic reviews. The findings highlight disparities in the global DHI evidence base and identify priority gaps for future research and implementation.
Large language models (LLMs) may support patients by addressing their queries; however, their real-world clinical use remains uncertain. Patients' queries prospectively collected and answered by nuclear medicine physicians, administrative staffs, and ChatGPT v4.1. Responses evaluated and Likert-scored by medical and administrative experts and two independent non-experts using 15 out of 17 dimensions of the proposed QUEST framework. LLM-generated responses classified as "better," "equivalent," or "worse" relative to human-generated responses on question-by-question basis; binomial tests assessed whether LLM performance exceeded 50%. Inter-rater agreement was assessed using Prevalence-Adjusted Bias-Adjusted Kappa (PABAK); statistical significance set at p < 0.05. 339 drug interaction, 42 medical, and 76 administrative queries were analysed. For medical queries, in 8 of 10 dimensions, 76-98% of LLM-generated responses rated equivalent or better than human-generated responses by medical expert (p < 0.001). For administrative queries, non-expert raters judged LLM-generated responses more informative (97%) and preferred (86%). For medical queries, LLM-generated responses rated more informative (67%), human-generated responses judged easier to understand (62%), with 60% disagreement on overall preference. PABAK showed higher agreement for LLM- than human-generated responses across medical (0.14-0.90 vs -0.90-0.52) and administrative (0.92-1.00 vs -0.63- - 0.13) queries. LLM-generated responses were consistently rated favourably, particularly for administrative queries, though further validation is required before clinical use.
Standardization of electronic health records (EHRs) has enabled the use of clinical codes in AI. We introduce ClinVec, an embedding store that provides embeddings for 153,166 clinical codes and concepts across eight vocabularies. ClinVec embeds ClinGraph, a knowledge graph with over 2 million edges tailored to clinical vocabularies used in EHRs. We validate the embeddings using an inter-institutional clinician panel and N = 3767 clinical term pairs spanning 11 disease areas, and we find that embedding similarity reflects clinical relatedness. We use ClinVec for knowledge injection in large language model medical question answering and for unsupervised patient stratification and risk prediction. By providing a shared representation of clinical concepts, ClinVec supports knowledge-grounded AI systems for modeling patients and populations.
Malaria remains a substantial global health burden with current diagnostics having notable limitations. Microscopy is labor-intensive and operator-dependent; rapid diagnostic tests lack sensitivity and provide qualitative rather than quantitative results. Recent AI advances, particularly deep learning, demonstrate significant potential for malaria diagnostics through automatic parasite detection in blood smears. Numerous systems achieve outstanding accuracy-comparable to human experts-while increasing throughput and reducing costs. In endemic regions, AI-based diagnostics can expand testing access; in non-endemic settings, they assist clinicians rarely encountering malaria, potentially reducing misdiagnosis rates. Successful AI applications emphasize digital medicine's broader potential to address global health disparities through automated, expert-quality diagnostics. Yet, challenges remain-inconsistent dataset annotation standards and limited representation of diverse endemic regions-for widespread AI-based diagnostic adoption. This review examines current diagnostic methods and evaluates the translational potential of AI-driven innovations in malaria diagnostics, discussing practical implications for researchers and stakeholders seeking to integrate these advances into clinical practice.
We performed deep learning analysis of histopathological whole-slide (full-face) images (WSI) to predict ATM pathogenic or likely pathogenic variant (PV/LPV) status of women with breast cancer and identify specific histological patterns of their tumor.In the discovery set composed of tumors from PV/LPV carriers (58 WSI) and noncarriers (129 WSI), our deep learning model predicted ATM status of patients with an area under the curve of 0.90 [95%CI: 0.85-0.95] and a balanced accuracy of 0.80 [95%CI: 0.72-0.88]. In the replication set (29 WSI from carriers and 22 WSI from noncarriers), corresponding results were 0.85 [95%CI: 0.70-1.00] and 0.67 [95%CI: 0.51-0.83]. We found that tumors developed by ATM PV/LPV carriers often displayed discohesive neoplastic cells as observed in invasive lobular carcinomas, and dense lymphocytic infiltrate reflecting an immune-enriched microenvironment.Recognizing these tumors at the time of diagnosis is a critical first step toward precision medicine in affected women and precision prevention in family members.
Biology-based mathematical models calibrated to patient-specific imaging data can accurately predict and optimize therapeutic response for individual cancer patients. We present the framework, motivation, and examples of using imaging data to calibrate mathematical models and predict patient outcomes. We then discuss patient-specific digital twins for optimizing interventions and outcomes. We aim to demonstrate the importance of imaging-based mathematical models in continuing the paradigm shift from population-based to individual-based cancer care.
Exposure to high-energy charged particles like ¹⁶O ions poses risks to brain function during deep space travel. This study examined long-term neuroinflammatory and synaptic changes in rats 6 months after low-dose total-body ¹⁶O irradiation (1 cGy and 10 cGy), focusing on the frontal cortex, striatum, hippocampus, and thalamus. Using Western blotting, qPCR, and digital PCR, we assessed expression levels of neuroinflammatory (GFAP and IBA1) and synaptic (GAP43, PSD95, SYP, SPINO, and GPHN) markers. Correlation analyses to examine the relationship between the assessed molecular changes and behavioral performance in social odor recognition and psychomotor vigilance tasks were performed. In the hippocampus, 1 cGy exposure resulted in increased IBA1, GAP43, and PSD95, whereas 10 cGy exposure led to decreased GFAP and increased PSD95, SPINO, and GPHN. In the striatum, synaptic markers were elevated after both 1 cGy and 10 cGy exposure, though region-specific differences in gene expression levels were observed (qPCR/dPCR). In the frontal cortex, no changes across any of the targeted markers were detected, except for increased SYP in 1 cGy rats. In the thalamus, GFAP was reduced in 10 cGy rats, while GAP43 was increased in 1 cGy and PSD95 was increased in both 1 cGy and 10 cGy exposed animals. Correlation analyses revealed significant associations between molecular changes and behavior, including a negative correlation between hippocampal SYP protein levels and social memory performance in the 1 cGy group. Taken together, these results suggest dose- and region-specific brain responses to low-dose 16O ion exposure-a vital component of space radiation-culminating in an enhanced synaptic remodeling and possible neurological alterations. The data also highlight potential molecular mechanisms underlying region-based (i.e., striatum) cognitive vulnerability following low-dose particle exposure.
Natural language processing (NLP)-enabled artificial intelligence (AI) conversational agents (CAs) are increasingly adopted in digital mental health interventions, yet the efficacy of such CAs grounded in cognitive behavioral therapy (CBT) remains unclear. This study aims to examine the intervention effectiveness of CBT-based NLP-enabled AI CAs in various mental health problems. A total of 15 randomized controlled trials with 1737 participants were included in the analysis. The results indicated that CBT-based NLP-enabled AI CAs showed a small to moderate effect on depressive symptoms and a small effect on negative affect; while the effects on generalized anxiety, stress, and positive affect were not significant after adjusting for publication bias. Subgroup analyses provided preliminary evidence that multi-modal CAs may be more effective than single-modality CAs in reducing depressive symptoms, and that the absence of psychoeducational content was associated with larger post-test effect sizes. Notably, meta-regression revealed that higher-quality studies reported larger effect sizes, suggesting that the true efficacy of these interventions may be underestimated in the current literature. In addition, younger age was associated with a greater reduction in depressive symptoms. These findings underscored the potential of CBT-based NLP-enabled AI CAs in addressing certain mental health issues and in certain populations.
The slow progression of Alzheimer's disease (AD) poses a challenge for the quantification of early disease-driven cognitive decline. Here, we show that frequently administered remote and unsupervised digital cognitive assessments can detect differences in cognitive decline within 30 weeks in early AD. The sample comprised 202 individuals (52-85 years old) recruited from longitudinal observational studies, who were cognitively unimpaired (CU, n = 152) or had a diagnosis of mild cognitive impairment (MCI, n = 50). Participants self-administered remote tasks testing memory precision for objects and scenes, associative memory, and familiarity-dependent memory. The MCI group showed greater decline than the CU group in the familiarity-dependent task, while stratifying the MCI group by beta-amyloid (Aβ) status (n = 21 Aβ-; n = 24 Aβ+) revealed greater change in memory precision for objects and familiarity-dependent memory in the MCI Aβ+ group. A 30-week change in the remote familiarity-dependent task was correlated with a multi-year change in annual in-person neuropsychological assessments. In conclusion, frequent remote cognitive testing is a promising tool to feasibly capture and monitor subtle and short-term cognitive decline.
Lumbar intervertebral disc degeneration, a key indicator of aging in the human movement system, is linked to increasing global cases of low back pain. Current diagnostic methods rely on imaging and physician experience, lacking predictive tools and personalized treatment strategies. This study used a multicenter lumbar MRI dataset to map disc degeneration in the Chinese population, revealing three accelerated degeneration phases during the lifecycle. Age heatmaps highlighted the degeneration rate of L1-L3 segments, highly synchronized with true age, serving as a baseline for physiological aging estimation. A contrastive learning-based slice ensemble network achieved a mean absolute error of 2.59 years in age estimation, and multi-center validation confirmed its reliability. Two digital imaging biomarkers, Age Delta and Age Selta, were proposed and preliminarily validated in longitudinal cases as a proof-of-concept demonstration. This study primarily demonstrates the feasibility and potential clinical value of data-driven lumbar aging biomarkers.
Oropharyngeal dysphagia affects over half of neurological and oncological populations, yet rehabilitation is constrained by a global therapist shortage that human-AI collaboration has not demonstrably addressed. Here we report a systematic review of 31 studies (1012 participants; PROSPERO: CRD420251115997) evaluating AI-augmented swallowing rehabilitation in adults with oropharyngeal dysphagia, or in healthy volunteers testing systems designed for clinical application. We synthesised findings by aetiology and collaboration mode, assessing risk of bias and certainty of evidence (Grading of Recommendations, Assessment, Development and Evaluation, GRADE). AI-augmented interventions produce short-term gains in functional oral intake and physiological measures (GRADE moderate/low certainty), but these effects attenuate within weeks of cessation, and adherence declines sharply once clinician supervision is withdrawn. NASSS framework analysis reveals a central paradox: the adopter domain-digital literacy, cognitive impairment, interface usability-is the dominant implementation barrier (61.3% rated high), meaning the populations with the greatest need face the steepest barriers to adoption. AI algorithm performance is rated at very low certainty, with validation largely confined to healthy volunteers. These findings support advancement to pragmatic trials for supervised post-stroke rehabilitation but underscore that evidence for other aetiologies, unsupervised settings, and sustained outcomes remains insufficient.
Urothelial carcinoma, predominantly appearing as non-muscle-invasive papillary urothelial carcinoma (NMIPUC), exhibits wide clinical variability. Accurate pathological staging and grading are essential for effective risk stratification and treatment decisions. Advancements in artificial intelligence (AI) open new opportunities to improve predictive models; however, their generalizability across diverse datasets remains to be addressed. This study developed a federated learning (FL)-based AI framework to enhance model robustness across institutions and predictive accuracy for non-muscle-invasive bladder cancer staging, grading, and a novel histological risk factor derived by clustering histological features for relapse prediction. Retrospective data, including 1437 NMIPUC cases from two institutions in Lithuania and Taiwan, were used for development and analysis. The FL models demonstrated improved robustness across participating institutions and higher accuracy compared to single-site models, achieving 86.2% accuracy for tumor stage and 79.2% for tumor grade, with minor performance variability across the datasets. Moreover, the novel histological risk factor outperformed conventional indicators of relapse-free survival (RFS) in NMIPUC patients treated with BCG immunotherapy, achieving hazard ratios of 2.7 (p = 0.0018) and 2.8 (p = 0.0208) in the Lithuania and Taiwan datasets, respectively. These findings highlight the potential of FL and histological feature-based AI models in providing robust, generalizable solutions for NMIPUC risk stratification and offer insights for personalized clinical interventions.
Artificial intelligence (AI) has shown promise in dermatology, offering accurate and non-invasive diagnosis of skin cancer. While extensive research has addressed skin-tone bias, gender bias in dermatologic AI remains underexplored, potentially perpetuating diagnostic disparities. In this study, we developed LesionAttn, an algorithm designed to mitigate gender bias by directing model attention toward lesions, thereby mirroring clinicians' diagnostic focus. Combined with Pareto Frontier optimization for dual-objective model selection, LesionAttn balances gender fairness and diagnostic performance. Validated on two large-scale dermatologic datasets for binary malignancy classification, LesionAttn significantly mitigated gender bias while maintaining high diagnostic performance, outperforming existing bias-mitigation algorithms. Our study demonstrates that explicitly guiding model attention to medically essential features provides a practical approach to advance both performance and fairness in dermatologic AI. By leveraging clinical priors to bridge the gap between human expertise and algorithmic optimization, this study demonstrates a feasible pathway for developing equitable and reliable diagnostic tools.
Isolated rapid eye movement sleep behavior disorder (iRBD) is a major prodromal marker of α-synucleinopathies, often preceding the clinical onset of Parkinson's disease, dementia with Lewy bodies, or multiple system atrophy. While wrist-worn actimeters hold significant potential for detecting RBD in large-scale screening efforts by capturing abnormal nocturnal movements, they require a reliable and efficient analysis pipeline. This study presents ActiTect, a fully automated, open-source machine learning tool to identify RBD from actigraphy recordings. To ensure generalizability across heterogeneous acquisition settings, our pipeline includes robust preprocessing and automated sleep-wake detection to harmonize multi-device data and extract physiologically interpretable motion features. Model development was conducted on a cohort of 78 individuals, yielding strong discrimination under nested cross-validation (AUROC = 0.95). Generalization was confirmed on a blinded local test set (n = 31, AUROC = 0.86) and two independent external cohorts (n = 113, AUROC = 0.84; n = 57, AUROC = 0.94). To assess robustness, leave-one-dataset-out cross-validation across cohorts demonstrated consistent performance (AUROC range = 0.84-0.89). Complementary stability analysis showed that predictive features remained reproducible across datasets, supporting the pooled multi-center pre-trained model for broader deployment. As an open-source, easy-to-use tool, ActiTect promotes adoption, independent validation, and collaborative improvements, thereby advancing generalizable wearable-based RBD detection.
AI chatbots increasingly provide patients with prescribing-level advice, often without physician involvement. Under US law, we argue that the learned intermediary doctrine, which has long governed drug-manufacturer liability for failure to warn, operates differently across two emerging pathways: AI behind the clinician, and AI advising patients directly. The more urgent implication lies elsewhere, under ordinary medical malpractice: clinicians may be required to screen patients for AI-sourced prescribing advice.
Large language models (LLMs) are increasingly applied in clinical communication, yet their reliability depends on high-quality conversational corpora. Real-world doctor-patient recordings are frequently degraded by noise, transcription errors, speaker overlap, and fragmented dialogue structure, limiting their usability for downstream model training. Here, we present an agent-based transcription framework that autonomously converts raw unstructured conversation transcriptions (RUCT) into structured conversation transcriptions (SCT) suitable for LLM fine-tuning. The system integrates three coordinated modules-Planner, Memory, and Executor-to orchestrate noise removal, content correction, speaker identification, and dialogue segmentation within a self-correcting workflow. Applied to 7197 minutes of Chinese clinical recordings across eight departments, with an additional 240 minutes of English-language dialogues used as a limited portability check, the agent achieved high reconstruction accuracy (94.7% denoising, 96.9% content correction, 88.6% speaker identification, 92.7% segmentation) and operated 3.6× faster than manual processing. In controlled comparisons against a cascaded deep-learning pipeline, a sequential non-agent execution, and an end-to-end large-context model, the agent achieved consistently higher performance across all four processing tasks. Architectural ablation further revealed marked degradation when Planner or Memory modules were removed (e.g., up to 47.6% reduction in speaker identification), supporting the contribution of coordinated task decomposition and cross-step state retention. To assess downstream impact, we fine-tuned an independent open-weight model (Qwen3-32B) on agent-generated SCT versus RUCT derived from an identical training set. Agent-generated SCT fine-tuning significantly improved overall quality scores (3.1 to 3.7; P < 0.001; Fleiss' κ = 0.82) in blinded expert evaluation across six clinically grounded dimensions, and also yielded higher scores on an external medical dialogue benchmark (HealthBench) than both RUCT fine-tuning and the non-fine-tuned baseline. These findings indicate that agent-structured clinical corpora enhance LLM fine-tuning performance and provide a scalable framework for reliable medical conversational AI development.