Nearly 25 years ago, the Institute of Medicine (now the National Academy of Medicine), envisioned the concepts of a learning health system (LHS) as a path to reduce the discordance between scientifically demonstrable effectiveness of medicine and actual health care delivery by conscientiously leveraging the experiences, perspectives, and priorities of patients and frontline clinicians. Whereas many of the National Academy of Medicine's aspirational deliverables have not been attained within the time frame anticipated, the value of an LHS has nonetheless been increasingly recognized and continues to gain momentum nationally and internationally. In November 2024, Mayo Clinic Health System organized an LHS symposium with research leaders from Mayo Clinic and the University of Minnesota. The goal was to strategize future collaborative statewide LHS efforts, building on past experiences and lessons learned. Building on a foundation of implementation science and de-implementation principles, attendees contemplated opportunities to collaborate along strategic priorities of Mayo Clinic Health System research, including health equity, rural and population health, artificial intelligence validation and stewardship, and cancer care. The University of Minnesota provided examples of the LHS infrastructure built to support tomorrow's researchers and health care leadership while contemplating feasible opportunities to build, fund, and sustain a statewide LHS infrastructure. Anticipated future obstacles for LHS were also explored.
Minimally invasive and robotic cardiac surgery have been developed to reduce surgical trauma, shorten recovery, and improve cosmetic and functional outcomes. However, these approaches often require longer cardiopulmonary bypass (CPB) and aortic cross-clamp times than conventional full sternotomy, and CPB duration remains an independent predictor of postoperative morbidity and mortality, particularly in frail patients with reduced physiological reserve. The resulting less invasive access/prolonged extracorporeal support duration paradox poses a major physiological and clinical challenge. Contemporary evidence from randomized and observational studies reports that while minimally invasive and robotic procedures achieve comparable or improved survival and functional recovery, extended CPB and aortic clamp times can amplify the risk of renal dysfunction, neurological events, and systemic inflammation. Advances in digital health are now transforming intraoperative perfusion management: high-frequency data acquisition, automated oxygen delivery and consumption analytics, and real-time artificial intelligence-driven predictive models enable early detection of perfusion imbalance and metabolic distress. Integration of these data streams within interoperable platforms and patient-specific digital twins may allow dynamic modeling of perfusion adequacy and adaptive control of pump flow, temperature, and hemodynamics. By converting CPB duration from a static procedural metric into a digitally monitored, optimizable variable, precision perfusion could reconcile minimal invasiveness with physiological safety. Future research should validate these digital frameworks in multicenter studies and establish standards for transparency, interoperability, and ethical implementation in real-world cardiac surgery.
Cardiovascular and chronic disease prevention remains limited by episodic, clinic-based assessments that fail to capture physiological changes arising in daily life. As mobility constitutes one of the most stable and repetitive environments people inhabit, vehicles offer a unique setting for subliminal, continuous health monitoring. This narrative presents the rationale and foundational framework for Automotive Health 2.0, a clinically oriented paradigm that transforms connected vehicles into validated platforms for physiological sensing, data integration, and proactive care delivery. Building on existing in-cabin cameras, radar, and microphones, multimodal algorithms enable unobtrusive estimation of cardiovascular, respiratory, and behavioral parameters during routine driving. Technological innovation lies in combining these signals with artificial intelligence-driven analytics to detect early disease signatures, support dynamic risk assessment, and enable adaptive telemonitoring directly linked to electronic health records. Clinically, this approach distinguishes regulatory-grade monitoring from consumer wellness tools by prioritizing accuracy, reproducibility, and integration with established workflows. Patients gain earlier detection and more equitable access to preventive care; clinicians receive continuous actionable data, and health systems benefit from scalable population-level monitoring. Automotive Health 2.0 positions the vehicle as a novel extension of the health care ecosystem, embedding validated prevention seamlessly into everyday life.
Despite its demonstrated effectiveness at improving outcomes, pulmonary rehabilitation (PR) for chronic obstructive pulmonary disease (COPD) is underutilized. Sensor-generated data from wearable devices have the potential to mitigate this challenge by generating digital endpoints that provide insights into patient behaviors at home; however, there is no consensus on how to measure home-based PR (HBPR) outcomes with these tools. This review aims to describe (1) the most frequent digital endpoints used in HBPR studies and (2) the devices used to capture these endpoints, summarizing gaps in their applications to HBPR for COPD patients. We completed a scoping review using the PRISMA checklist across databases (Web of Science, Scopus, and OVID) from January 1, 2005 to June 1, 2025. We included peer-reviewed articles on HBPR for COPD, excluding reviews, commentaries/editorials, poster abstracts, and conference proceedings. Eligible articles included cohort studies and clinical trials of adult patients (age ≥ 18 years) with COPD participating in HBPR that include one or more digital endpoints. Among eligible articles (n = 218), 13 (6.0%) met inclusion criteria, the majority of which were published after 2020 (61.5%). Most studies enrolled fewer than 100 COPD patients (76.9%) for an average monitoring period of 12.5 weeks. Activity trackers were the most commonly used device (46.2%) to capture data. The most frequently used digital endpoints were step count (84.6%), time spent active (38.5%), and time spent sedentary (30.8%). Two study designs were used: randomized controlled trial (76.9%) and observational cohort. Study designs were heterogenous with more than one-third (38.5%) presenting a lack of statistically significant results. Although we identified analogous digital endpoints in some studies, dissimilar methods and study designs remain barriers to synthesizing results generated from HBPR programs for COPD. Wearable devices have the potential to build novel PR models, but more work is needed to translate real-world data into clinically meaningful measures. Future research should elucidate which participants would benefit most from and complete HBPR to build an evidence base for the validation of HBPR-relevant digital endpoints, particularly those derived from less common sources like cardiovascular and sleep measures.
暂无摘要(点击查看详情)
暂无摘要(点击查看详情)
To compare the performance of 4 large language model chatbots in response time and quality of clinical answers, evaluated by specialists using predefined validity criteria. Between June 1 and September 20, 2025, four clinical vignettes (orthopedics, pediatrics, gynecology, and psychiatry) were developed by independent experts and answered by 4 conversational agents: Arkangel AI, OpenEvidence, ChatGPT, and Medisearch. Each vignette included 4 questions (diagnosis, clinical management, research, and general knowledge). Responses were independently evaluated by external clinicians using an 8-criterion Likert scale assessing correctness, consensus agreement, absence of bias, adherence to standards of care, timeliness, patient safety, authenticity of cited references, and contextual appropriateness. Response times were summarized using medians and interquartile ranges. A total of 128 question-answer pairs (1024 evaluations) were analyzed. Overall satisfaction ranged from 71.1% (727) to 93% (952) across agents, with statistically significant differences (Kruskal-Wallis P<.001). Dissatisfaction was most frequent for reference authenticity in some ChatGPT modes (75% [n 91/128]-97% [n 119/128]), whereas Arkangel AI-Deep, ChatGPT-Deep, and OpenEvidence showed 100% satisfaction for this criterion. Satisfaction was high for correctness and consensus, with greater variability for bias and patient safety. Satisfaction differed by specialty, with higher scores in gynecology and lower scores in pediatrics (P<.001). Median response times ranged from 18 seconds to 12.8 minutes, with significant differences across agents and modes (Wilcoxon P<.05). Large language model chatbots showed substantial variability across validity dimensions when assessed using expert clinical judgment, supporting the need for standardized, multidimensional evaluation frameworks.
To evaluate the impact of mobile health interventions (MHIs) on opioid use, nonopioid medication use, adherence, quality-adjusted life years (QALYs), and health care utilization among adults with chronic pain. A systematic review (PROSPERO CRD4202457819) was conducted using PubMed, Embase, CINAHL, Web of Science, and the Cochrane Library from April 23, 2024, to May 2, 2025. Randomized controlled trials (RCTs) evaluating mobile-accessible interventions for adults aged 18-65 years with chronic pain, low back pain, or chronic low back pain were included. Two reviewers independently screened studies and assessed risk of bias. Owing to heterogeneity, outcomes were synthesized narratively, and standardized mean differences were calculated when appropriate. Fourteen trials (n=3766) met inclusion criteria. Six studies (n=1098) evaluated opioid use, with 4 reporting significant reductions. All 5 studies assessing nonopioid medication use (n=810) demonstrated decreases. Of 4 studies examining adherence (n=851), 3 showed improvement. QALY gains were observed in 4 of 5 studies (n=2252), although effect sizes were small. No significant reductions in health care utilization were identified across included trials. Mobile health interventions reduce opioid and nonopioid medication use and improve adherence in adults with chronic pain. Evidence for QALY improvement is modest, and effects on health care utilization remain uncertain. Moreover, MHIs serve as valuable adjuncts to chronic pain care, particularly for medication stewardship. Further high-quality, long-term randomized trials using standardized outcome measures are needed.
To evaluate the current state of clinically tested augmented reality and mixed reality, hereafter referred to as AR, navigation systems, with a focus on accuracy, usability, and factors influencing clinical implementation. A systematic search was conducted in Embase, PubMed, Scopus, and Web of Science Core Collection including studies published between January 1, 2018, and July 14, 2025. Only clinical studies involving patients were included. Data extraction covered study characteristics, surgical specialty, AR system type, reported accuracy, usability assessments, and implementation-related factors. Of 1956 screened records, 61 studies met the inclusion criteria. The applications spanned oral and maxillofacial surgery (k=17), neurosurgery (k=12), spinal surgery (k=11), and orthopedics (k=10), among others. Reported accuracy metrics varied substantially across studies. Moreover, AR navigation often reduced radiation exposure during surgery and sometimes shortened operative time, although timing effects varied by specialty. Usability was rarely measured with standardized tools and was mostly described qualitatively. Common limitations were limited accuracy, ergonomic issues, and workflow-integration challenges. Grading of Recommendations Assessment, Development, and Evaluation ratings were generally low or very low owing to small samples, heterogeneous methods, and risk of bias. Although AR navigation demonstrates encouraging technical performance and potential reductions in radiation exposure, the underlying evidence is predominantly of very low certainty, and any impression of pooled robustness should be avoided. Furthermore, clinical integration is hindered by technical, ergonomic, and workflow-related barriers. Future work should incorporate robust methodological designs aimed at improving registration accuracy, overcoming hardware shortcomings, and systematically evaluating usability through validated, standardized tools.
To define phenotypic subgroups of thrombotic microangiopathy (TMA) by integrating clinicopathologic data to identify patterns suggestive of pathogenic features and etiologies. We retrospectively analyzed 283 patients with biopsy-confirmed renal TMA between January 1, 2010, and December 31, 2020. Sixty-four clinicopathologic variables were applied in consensus clustering. Three distinct clusters were identified. Cluster 1 (n=78), predominantly women (n=50, 64%), was enriched for drug-induced (n=23, 29%), lupus-associated (n=9, 12%), and post-bone marrow transplant TMA (n=8, 10%). It showed wide immune-complex staining (IgM, C3, κ, and λ) and subendothelial deposits (n=16, 21%), suggestive of immune-associated injury. Cluster 2 (n=92) included the oldest patients (54±18 years) and was dominated by drug-induced TMA (n=21, 23%; one-third owing to vascular endothelial growth factor inhibitor bevacizumab) and monoclonal gammopathy-associated TMA (n=10, 11%). It showed marked endothelial damage (n=65, 71%) and mesangiolysis (n=66, 72%) with little immune staining, suggesting a toxic-related injury. Cluster 3 (n=113) exhibited the worst outcomes, with 80% (24/30) progressing to renal failure, compared with 31% (12/39) in cluster 1 and 34% (16/47) in cluster 2. This cluster was characterized by the youngest age (45±14 years), mostly males (n=67, 59%), severe hypertension (n=75, 66%), and low estimated glomerular filtration rate (10 mL/min/1.73 m2; interquartile range, 7-16). Biopsies showed acute vascular lesions-wrinkling of capillary tufts (n=74, 65%), fibrin thrombi (n=67, 59%), mucoid intimal edema (n=85, 75%), and onion-skin-type hyperplasia (n=55, 49%)-with minimal deposits (n=1, 0.9%), suggesting a severe vascular lesion process. Integrative clustering of clinical and histologic data in TMA identified 3 clinicopathologic phenotypes that may suggest underlying immune-, toxic-, or severe vascular lesion-related drivers. These findings support data-driven classification that may help prioritize diagnostic and therapeutic considerations.
To summarize the key intervention characteristics and evaluate the effectiveness and safety of digital therapeutics (DTx) in patients receiving oral anticoagulation, with effectiveness evaluated using time in therapeutic range (TTR), thromboembolic events, and mortality, and safety evaluated based on bleeding events. We searched PubMed, Embase, Web of Science, and the Cochrane Library from inception to June 20, 2025, and identified 10 randomized controlled trials involving 7237 patients. The criteria required studies to assess software-based DTx supporting anticoagulation management and report effectiveness or safety outcomes. Study quality was evaluated using the Grading of Recommendations, Assessment, Development, and Evaluation framework, and random-effects models were applied. Digital therapeutics interventions were associated with a lower incidence of major bleeding than usual care: no clear differences in TTR, thromboembolic events, or mortality. Evidence quality ranged from very low to high. Secondary analyses showed more international normalized ratio testing with DTx; rehospitalization rates did not differ significantly between the groups. Sensitivity analysis changed TTR effect after excluding a study with enhanced control, but other outcomes remained unchanged. Digital therapeutics interventions for anticoagulation management improve safety outcomes, particularly reducing major bleeding, and with greater monitoring intensity. Larger, long-term trials are needed to confirm the clinical benefits and evaluate cost-effectiveness. PROSPERO Identifier: CRD420251107441.
To develop and pilot test 2 context-specific digital tools-Standardized Community Overview for Planning and Evaluation (SCOPE) and Facility Review for Assessing Medical Environments (FRAME)-designed to support virtual care service delivery and health care planning in rural and remote communities. This study used a participatory case study design informed by user-centered development principles to design and pilot the SCOPE and FRAME tools. Development involved collaborative workshops with clinicians, health care administrators, community partners, and a software development team. Publicly available data sources were combined with community-verified information to populate community-specific profiles. The tools were piloted in 3 communities in Saskatchewan, Canada-Stanley Mission, La Loche, and Whitecap Dakota Nation-from October 1, 2024, to March 31, 2025. During pilot implementation sessions, participants interacted with the prototype and qualitative feedback regarding functionality, usability, and practical utility was collected. Participants reported that the SCOPE and FRAME tools supported health care planning and service coordination by providing centralized access to community-specific demographic, infrastructure, and logistical information. Users indicated that dynamic site management features enabled real-time updates to community profiles, allowing the tools to reflect evolving health care service availability. Data visualization dashboards, including interactive graphs and maps, were reported to support interpretation of health care trends and identification of service gaps. Clinicians also noted that the tools provided useful contextual information to support onboarding of virtual care clinicians and improve understanding of local care pathways. Findings from the pilot implementation suggest that context-specific digital tools such as SCOPE and FRAME may support improved coordination, planning, and contextual awareness in virtual care delivery for rural and remote communities.
To support cancer screening and identify precancerous conditions, such as atypical hyperplasia, to improve cure rates and reduce mortality, by analyzing the performance of a previously confirmed procedure. We established 204 short-term blood-derived cell lines from patients with cancer between December 1, 2013 and December 31, 2022. A dataset of phenotypic patterns, cytopathological variables, and proliferation profiles was used to train a neural network model. Comparative analysis of standard optical, functional, and machine learning supported diagnosis was performed to verify reproducibility and clinical translatability. Tumor heterogeneity was classified into 7 phenotypic patterns (Pn1-Pn7); the Pn6-7 group showed an overall survival of 8 months (95% CI, 6-9). Among variables Vc1-Vc8, Vc3 (area under the curve=1, P<.001) and Vc5-6 (area under the curve=0.838, P<.001) were identified as discriminant for cell atypia (rho=0.5 correlation with biopsy). Proliferative ranges were 0%-30% healthy, 30%-35% hyperplasia, and >35% cancer. Artificial intelligence-supported cytology showed positive and negative predictive values of 0.99±0.015 and 1±0 compared with histopathological specimens (0.94±0.1 and 0.85±0.04). The algorithm achieved a 1±0 sensitivity and a 0.98±0.04 specificity, with respect to traditional diagnosis (specificity [0.875-0.923] and sensitivity [0.926-0.984]). The model demonstrated fast adaptive performance in predicting cancer risk and primary source assessment. The results suggest that this screening model is sufficient to detect atypical hyperplasia compared with models based on oligoanalysis for single or double mutations.
Articles on the development of medical image artificial intelligence (AI) algorithms are numerous in the literature, but deployment to clinical practice is infrequently discussed. The Enterprise Radiology Framework for AI Software Technology Team at Mayo Clinic has been focused on bridging the gap in clinical translation of medical image AI algorithms since its inception in 2019. During this time, we have released 17 algorithms into our radiology clinical practice. Recently, we have placed an increased focus on monitoring these algorithms, as there are few reports with practical experience documented in the literature. Our increased monitoring efforts include daily, weekly, and yearly monitoring of utilization, failure modes, data drift, and end-user feedback through automated alerts, dedicated dashboards, and pointed investigations to enable optimal algorithmic processing. End-user feedback is elicited yearly during annual reviews to ensure clinical needs are still being met. Automated monitoring has enabled earlier identification of problems, such as images no longer routing through the orchestration engine to the appropriate algorithm, minimizing potential disruption to the clinical practice and ensuring continued algorithmic utilization. Monitoring has also reinforced the importance of key aspects of interdisciplinary research and translation, such as early discussions on clinical needs coupled with technological ability and proper training. By providing our experience in and continuing to improve monitoring methods as a community, we can all minimize risk and maximize the benefits of medical pixel-based AI.
To evaluate whether generative pretrained transformer (GPT)-4 can detect and revise biased language in emergency department (ED) notes, against human-adjudicated gold-standard labels, and to identify modifiable factors associated with biased documentation. We randomly sampled 50,000 ED medical and nursing notes from the Mount Sinai Health System (January 1, 2023, to December 31, 2023). We also randomly sampled 500 discharge notes from the Medical Information Mart for Intensive Care IV database. The GPT-4 flagged 4 types of bias: discrediting, stigmatizing/labeling, judgmental, and stereotyping. Two human reviewers verified model detections. We used multivariable logistic regression to examine associations between bias and health care utilization, presenting problems (eg, substance use), shift timing, and provider type. We then asked physicians to rate GPT-4's proposed language revisions on a 10-point scale. The GPT-4 showed 97.6% sensitivity and 85.7% specificity compared with the human review. Biased language appeared in 6.5% (3229 of 50,000) of Mount Sinai notes and 7.4% (37 of 500) of Medical Information Mart for Intensive Care IV notes. In adjusted models, frequent health care utilization (adjusted odds ratio [aOR], 2.85; 95% CI, 1.95-4.17), substance use presentations (aOR, 3.09; 95% CI, 2.51-3.80), and overnight shifts (aOR, 1.37; 95% CI, 1.23-1.52) showed elevated odds of biased documentation. Physicians were more likely to include bias than nurses (aOR, 2.26; 95% CI, 2.07-2.46); GPT-4's recommended revisions received mean physician ratings above 9 of 10. The study showed that GPT-4 accurately detects biased language in clinical notes, identifies modifiable contributors to that bias, and delivers physician-endorsed revisions. This approach may help mitigate documentation bias and reduce disparities in care.
To evaluate and compare the diagnostic performance of 2 clinical decision support system tools-ORADIII and ORAD DDx-against histopathological diagnosis in identifying intrabony jaw lesions using orthopantomograms. A diagnostic accuracy, cross-sectional study was conducted in the Department of Oral Medicine and Radiology, Kathmandu University School of Medical Sciences, Dhulikhel Hospital, Kavre, Nepal, from January 1, 2025, to April 30, 2025, after institutional review committee approval. The study was conducted on a sample comprising both lesion and nonlesion cases based on radiographic evaluation. Diagnostic outputs from ORADIII and ORAD DDx were compared with histopathology. Key performance indicators-including sensitivity, specificity, accuracy, F1 score, positive predictive value, negative predictive value, and likelihood ratios (positive and negative)-were calculated for both systems. Among the 350 samples evaluated, including 175 lesion positive and 175 nonlesion cases, ORAD DDx demonstrated superior diagnostic performance compared with ORADIII. The sensitivity, specificity, accuracy, and F1 score for ORADIII were 64.57%, 60.00%, 62.28%, and 0.6314, respectively. In contrast, ORAD DDx achieved sensitivity, specificity, accuracy, and F1 score of 70.29%, 65.71%, 68.00%, and 0.687, respectively. ORAD DDx showed better diagnostic performance than ORADIII across most metrics, indicating its potential as a more reliable clinical decision support system for diagnosis decision support for intrabony jaw lesions. This could also be due to its categorizing of lesions and variations. Further validation with larger, stratified, and multicenter data sets is recommended.
To evaluate movement patterns in patients with ankle fractures before and after injury. This descriptive study analyzed movement data from patients treated for ankle fractures at Sahlgrenska University Hospital, Sweden (January 1 to December 31, 2022). Patients were identified using ICD-10 codes and medical records. Inclusion criteria: surgical and nonsurgically treated patients with >6 months of preinjury iPhone use. Step count, length, and speed were collected through a mobile application integrated with Apple Health. Double support and gait asymmetry were excluded due to limited external validity. Data spanned 6-12 months preinjury to 1 year post-injury. The primary aim was to evaluate whether patients reach their preinjury movement patterns. Of 1131 patients, 90 were analyzed. Preinjury means: 5435.0 steps (SD 4215.3), step length 0.70 m (SD 0.07), and step speed 1.28 m/s (SD 0.2). At 1 year: 5420.3 steps (SD 3887.0), step length 0.68 m (SD 0.08), and step speed 1.22 m/s (SD 0.19). A post-injury plateau was reached in step parameters at 84.8 days, with no further recovery thereafter. Step count largely recovered, but deficits in step length and speed persisted at 12 months. Smartphone-derived movement data provide a cost-effective alternative to laboratory gait analysis, enabling long-term monitoring. Preinjury data allow individualized baseline comparisons and may support earlier identification of patients needing adjusted rehabilitation.
To compare acceptability of 2 artificial intelligence (AI) use cases in the English National Health Servic Breast Screening Program. From February 7 to March 14 2024, we conducted an online survey, randomizing participants to information about using AI either as the second mammogram reader or to triage mammograms. In the triage scenario, only higher-risk images would be reviewed by a human reader. The survey was completed by 3419 women aged 45 to 70 years, recruited from an online panel. The primary outcome was acceptability of the presented AI use case. We assessed a range of psychological and demographic factors. Regression modeling examined predictors of acceptability. Using AI as a second reader was rated as more acceptable (P<.001), less concerning (P<.001), and less likely to put people off screening (P =.001) than using it as a triage tool. In both groups, most women said AI would not affect their breast screening attendance (1251/1710 [73%] and 1195/1709 [70%] in the second reader and triage groups, respectively). Nevertheless, 15% (498/3419) of participants stated that the use of AI would make them less likely to attend. After adjusting for AI use case, acceptability was higher in respondents of older age, White ethnicity, higher education, greater AI knowledge, and with more positive attitudes toward both AI and breast screening. Artificial intelligence in breast screening was rated as more acceptable if used alongside, rather than instead of, a human reader. Ongoing careful evaluation is needed to ensure its roll-out does not widen existing social inequalities and that the risk-benefit profile of screening is maintained.
To evaluate the clinical utility of combining artificial intelligence (AI) with handheld focused cardiac ultrasound (FoCUS) performed by noncardiologist physicians in clinical care settings. In this prospective, single-arm study conducted from July 1, 2022, through December 31, 2023 (ClinicalTrials.gov NCT05455541), 660 adult patients presenting to the emergency department or internal medicine wards were assessed with handheld ultrasound devices enhanced by AI algorithms. These algorithms provided automated analysis of ventricular function, valvular disease, pericardial effusion, and inferior vena cava size. Participating physicians received focused training and performed examinations either in response to clinical suspicion or as part of routine evaluation. The primary outcome was whether AI-guided FoCUS contributed to new diagnoses, treatment modifications, or additional procedures. Artificial intelligence-enhanced FoCUS identified clinically relevant cardiac findings in 193 patients (29%), including newly recognized valvular abnormalities and reduced left ventricular function. In 49 patients (7%), medical therapy was adjusted based on findings, and 9 patients (1.4%) underwent interventional procedures. Diagnostic performance analyses showed high sensitivity for detecting reduced left ventricular function and valvular disease, with lower sensitivity for right-sided abnormalities. This study demonstrates that integrating AI-enhanced FoCUS into routine workflows can provide clinically relevant information that may influence diagnostic assessment and management by noncardiology practitioners in acute care settings.