This study aims to compare the effectiveness of 2 ambient AI scribe technologies in reducing physician burnout, improving workflow satisfaction, and enhancing documentation efficiency through a randomized crossover trial. An open-label randomized crossover trial involving 160 outpatient clinicians was conducted at a tertiary academic medical center. Volunteers were randomized to 2 groups of 80 with 2 crossover periods. We assessed workflow satisfaction (1-7 scale), burnout (Copenhagen Burnout Index), and efficiency metrics (eg, electronic health record time outside scheduled hours, documentation time, etc.). Data was analyzed using Wilcoxon signed-rank tests and generalized linear mixed models. Surveys from 136 respondents were analyzed. Clinicians reported greater improvements in satisfaction with product B (2.51 points on a 7-point scale) compared to product A (1.91 points; mean difference: 0.60, 95% CI: 0.32-0.90). Both tools reduced personal and work burnout scores, but differences between tools were not meaningful. Product B demonstrated greater reductions in average minutes-in-notes per day compared to product A (B - A = -3.19 minutes; 95% CI -4.87 to -1.50). No meaningful differences were observed in pajama time or patient-related burnout. Both tools improved workflow satisfaction and reduced burnout, with product B showing superior performance in satisfaction and documentation time. However, efficiency metrics like pajama time were largely unaffected, potentially due to participant selection bias and the study period's timing. Product B yielded greater satisfaction and time savings compared to product A, though both tools effectively reduced physician burnout and improved workflow satisfaction.
[This corrects the article DOI: 10.1093/jamiaopen/ooaf134.].
Sharing behavioral health and wearable data poses privacy challenges, as traditional de-identification remains vulnerable to re-identification. Differential privacy (DP) provides mathematical guarantees through a tunable privacy budget, ϵ . This study evaluates the feasibility of generating and releasing DP synthetic behavioral health data with high analytical utility, identifying practical ϵ values for public data sharing. We analyzed physiological data from wearable devices and self-reported data from Phase 1 of the Lived Experiences Measured Using Rings Study (LEMURS), which tracked sleep, stress, and well-being among first-year college students. Three DP synthetic data generators: AIM, MST, and PATECTGAN, were evaluated across privacy budgets ranging from ϵ = 1 to 100. Utility was assessed using L1/L2 errors, correlation, regression, UMAP, and assessed vulnerability via privacy attacks. AIM outperformed MST and PATECTGAN in preserving both statistical and analytical properties of the original data. For the Survey dataset, the lowest marginal errors occurred at ϵ = 5 and 10. Correlation, regression, and UMAP analyses confirmed that AIM-generated data closely replicated original relationships at moderate ϵ values. Choice of privacy budget is still an open question, and it is task-agnostic and dataset-specific. Moderate privacy budgets ( 5 ≤ ϵ ≤ 10 ) maintained key associations between physiological and psychological measures while ensuring privacy. AIM's workload-aware design effectively allocated noise toward relevant features, enhancing performance. A privacy budget of ϵ = 5 offers a practical balance between data utility and participant privacy for LEMURS behavioral health data sharing.
This scoping review aimed to (1) map current applications of transformers and large language models (LLMs) for extracting social drivers of health (SDOH) from clinical text, (2) benchmark model performance across SDOH domains, and (3) evaluate methodological rigor to identify research gaps and inform clinical deployment. We searched PubMed, Web of Science, Embase, Scopus, and IEEE Xplore for studies applying transformers or LLMs to detect SDOH in clinical narratives. We developed a novel methodological framework integrating (1) hierarchical classification of SDOH domains and transformer/LLM architectures, (2) systematic synthesis of performance metrics, and (3) a 7-domain instrument assessing internal validity, external validity, and reporting transparency. Forty-two studies met inclusion criteria. Performance varied substantially across SDOH domains. Behavioral Factors achieved the highest median F1-score (0.87), while Health Care Access and Quality showed the lowest performance and greatest variability (median F1 = 0.59). Research concentrated in the United States (85.7%), relied predominantly on private institutional datasets (69%), and focused primarily on critical care populations (45.2%). Methodological assessment revealed critical gaps; only 29% of studies provided annotation guidelines, 24% assessed fairness across demographic groups, and 21% performed external validation. Smaller open-source transformer models show promise for democratizing SDOH detection by achieving competitive performance at lower costs while enabling secure local deployment in resource-limited settings. Advancing clinical readiness requires standardized reporting practices, diverse benchmark datasets across care settings, and systematic equity evaluation to prevent perpetuating health disparities. Transformer and LLM performance for SDOH detection varied substantially across domains, with encoder-based models excelling at structured tasks and decoder-only models at linguistically complex tasks. Critical gaps in fairness assessment, external validation, and dataset diversity restrict generalizability and readiness for widespread clinical deployment.
Echocardiography and cardiac catheterization reports capture important clinical assessment information of cardiac function and disease severity. This study explores using open-source transformer-based language models (LMs) that are run locally within an institutional environment as a privacy-preserving alternative to external API-based large LM to systematically extract clinical data from unstructured echocardiography and cardiac catheterization reports, aiming to improve data accessibility for research and patient care. Two transformer-based LMs, BioclinicalBERT and BART-Large-CNN, were fine-tuned in a secure local environment using a question-answering approach. The dataset included 3286 echocardiography and 1884 cardiac catheterization reports from Kaiser Permanente Southern California's electronic health records, annotated for 25 and 47 predefined categories, respectively. Three hundred reports from each type were randomly selected and used for validation, with the remainder for training. Model performance was assessed using accuracy, precision, recall, and F1-score at 2 probability thresholds. The effect of training set size on model performance was also evaluated. Both models achieved consistent and high accuracy, precision, and recall (all >90%) across the 5 seed runs for both report types. For echocardiography, BioclinicalBERT reached mean accuracy of 95.7%, precision of 97.6%, recall of 97.4%, and F1-score of 0.98 at the probability threshold of 0.1; BART-Large-CNN had similar results. For cardiac catheterization, BART-Large-CNN slightly outperformed BioclinicalBERT with mean accuracy 94.9% vs 94.3%; precision 96.7% vs 96.3%; recall 96.1% vs 95.7%, and F1-score 0.96 vs 0.96 at the probability threshold of 0.1. Most individual categories showed strong performance, though a few (eg, prosthetic mitral valve, right atrial pressure) had lower scores. Performance improved with more training data, but plateauing around 1000 reports. Fine-tuned transformer-based LMs can effectively extract structured data from unstructured cardiac reports, supporting automated information extraction to enhance research and clinical applications.
This study compares multiple LLMs, including ChatGPT, DeepSeek, and Llama, to generate meaningful, audience-adapted labels for the existing latent classes among patients with chronic low back pain (cLBP). Phenotypes were derived from baseline data from two cohorts within the NIH HEAL BACPAC consortium: BACKHOME, a large nationwide e-cohort (train set: N = 3025), and COMEBACK, a deep phenotyping cohort (test set: N = 450). The analysis included pain characteristics, psychosocial factors, lifestyle habits, and social determinants of health. ChatGPT-4o (OpenAI), DeepSeek-R1, and Llama 3.3 (Meta) were applied to generate class labels for each combination of audience (clinician, patient, and caregiver), tone (formal, empathetic, and informal), and technicality (high, medium, and low). Latent Class Model (LCM) identified four distinct behavioral phenotypes in patients with cLBP: High Distress and Maladaptive Behaviors, Resilient and Adaptive Coping, Intermediate Maladaptive Patterns, and Emotionally Regulated with High Pain Burden. Previously validated by domain experts, these profiles served as the basis for automated labeling using three LLMs (ChatGPT-4o, DeepSeek-R1, and Llama 3.3). Using different tones and complexity levels, each model produced class labels specific to clinicians, patients, and caregivers. The generated class names for all LLMs closely matched expert-defined traits like emotional regulation, resilience, and high distress, indicating strong conceptual alignment and the capacity of LLMs to generate precise, audience-specific labels for intricate behavioral and psychological profiles. These results highlight the possibility of integrating LLM-driven labeling into research and clinical practice, helping to achieve more transparent knowledge translation, improved decision-making, and personalized care.
Rare-earth high-entropy oxides (RE-HEOs) represent a distinct class of entropy-stabilized ceramics in which multiple lanthanide cations occupy a common crystallographic sublattice, generating strong chemical disorder, lattice distortion, and complex defect landscapes. Unlike transition-metal-based high-entropy oxides, RE-HEOs are governed by localized 4f electronic states, weak crystal-field coupling, and variable redox chemistry, leading to emergent structural, electronic, magnetic, and optical phenomena that challenge conventional solid-state descriptions. This review provides a physics-oriented analysis of RE-HEOs, focusing on the thermodynamic foundations of configurational entropy stabilization, the interplay between enthalpy, entropy, and kinetic trapping, and the consequences of severe chemical disorder for crystal structure and phase stability. We review how lattice distortion, oxygen vacancy disorder, and cation randomness modify phonon spectra, ionic transport pathways, and electronic structures, with particular emphasis on the role of localized 4f states, defect-induced in-gap levels, and disorder-broadened excitation spectra. Spectroscopic manifestations of disorder including crystal-field relaxation, line broadening, lifetime modification, and energy transfer processes are discussed within a unified framework linking local symmetry breaking to macroscopic response. We further discuss the optoelectronic properties of RE-HEOs, including photoluminescence from intra-4f transitions, upconversion mechanisms, and disorder-induced modifications of radiative lifetimes and quantum efficiency. The application landscape spans both energy conversion (electrocatalysis, solid oxide fuel cells, thermal barrier coatings) and optoelectronic technologies (phosphors, scintillators, optical thermometry, and anti-counterfeiting). Likewise, we assess theoretical and computational approaches, including density functional theory with strong correlation corrections, statistical thermodynamics, and emerging machine-learning models, highlighting their ability and current limitations in capturing disorder-driven physics in multi-component oxides. Finally, we identify open questions central to condensed-matter physics, including the nature of entropy-stabilized metastability, the limits of band theoretical descriptions in highly disordered 4f systems, and the role of configurational entropy in tuning electron-phonon and defect interactions. By consolidating experimental and theoretical insights, this review establishes RE-HEOs as a platform for exploring disorder-dominated solid-state physics beyond conventional crystalline oxides.
Endothelial nitric oxide synthase (eNOS) produces nitric oxide (NO), a key molecule for maintaining vascular health. While phosphorylation is a well-established regulatory mechanism of eNOS activity, the functional contribution of conserved lysine residues to electron transfer and catalytic coupling remains less clearly defined. In this study, we examined two conserved lysines in eNOS, Lys609 located within the autoinhibitory (AI) region and Lys733 positioned within the FMN-FNR hinge, by substituting them with arginine to preserve positive charge while altering side-chain geometry. Biochemical and spectroscopic analysis revealed that both substitutions significantly impaired enzyme function. Cytochrome c reductase activity was reduced by 3-6-fold, and NO synthesis decreased by approximately 37% for K609R and 25% for K733R relative to wild-type (WT) eNOS. Elevated NADPH/NO ratios indicated impaired catalytic coupling and increased diversion of electrons away from productive NO synthesis. Flavin fluorescence and auto-oxidation measurements showed that both mutations favored a closed, FMN-shielded conformation and reduced the Ca2+/calmodulin-induced transition to the open, catalytically competent state. Structural analyses and Molecular Dynamics simulations show that substitutions at Lys609 and Lys733 alter FMN-domain dynamics through distinct mechanisms. K609R induces increased flexibility and global expansion of the reductase domain, and K733R restricts hinge motion, maintaining overall compactness. Despite these defects, ferricyanide reductase activity was unchanged, showing that FAD-mediated hydride transfer remains unaffected. Electron flux through the heme correlated strongly with NO production, identifying heme-directed electron transfer as the principal step affected. Together, these findings suggest Lys609 and Lys733 as regulators of eNOS conformational dynamics, interdomain electron transfer, and catalytic efficiency.
Medical coding structures health-care data for research, quality monitoring, and policy. This study assesses the potential of large language models (LLMs) to assign International Classification of Primary Care, 2nd edition (ICPC-2) codes using the output of a domain-specific search engine. A dataset of 437 Brazilian Portuguese clinical expressions, each annotated with ICPC-2 codes, was used. A semantic search engine (OpenAI's text-embedding-3-large) retrieved candidates from 73 563 labeled concepts. Thirty-three LLMs were prompted with each query and retrieved results to select the best-matching ICPC-2 code. Performance was evaluated using F1-score, along with token usage, cost, response time, and format adherence. Twenty-eight models achieved F1-score>0.8; 10 exceeded 0.85. Top performers included gpt-4.5-preview, o3, and gemini-2.5-pro. Retriever optimization can improve performance by up to 4 points. Most models returned valid codes in the expected format, with reduced hallucinations. Smaller models (<3B parameters) struggled with formatting and input length. Large language models show strong potential for automating ICPC-2 coding, even without fine-tuning. This work offers a benchmark and highlights challenges, but findings are limited by dataset scope and setup. Broader, multilingual, end-to-end evaluations are needed for clinical validation.
Stigmatizing language in clinical documentation can contribute to healthcare disparities and affect patient-provider relationships. Given their strong capacity for contextual language understanding, large language models (LLMs) offer potential for detecting and reducing such language. This study evaluates the accuracy of LLMs in detecting stigmatizing language, focusing on model size, temperature settings, and the inclusion of examples. We evaluated multiple configurations of 2 local Llama-based large language models, Llama 3.2 (3B) and Llama 3.1 (8B) with varying temperature (0.25, 0.5, 0.75) and the inclusion of exemple prompts. The models were evaluated on 3643 de-identified clinical notes obtained from a tertiary care teaching hospital. Performance was assessed using accuracy, True Positive Rate (TPR), and True Negative Rate (TNR), with human annotator performance used as a benchmark. The 8B model with a temperature of 0.25 and examples achieved the highest overall accuracy (70.2%), with the best TPR (94.1%), but the lowest TNR (47.4%). The 3B model without examples achieved the highest TNR (99.7%) but a very low TPR (2%). The inclusion of examples improved model accuracy across all configurations, while temperature settings had a variable impact, with smaller models benefiting from higher temperatures and larger models performing better at lower temperatures. ED provider notes showed higher accuracy (69.4%) and the plan of care was the lowest (55.8%). Model size, temperature, and the inclusion of examples play a critical role in optimizing open-source LLM performance. Tailoring these parameters to note types enhances effectiveness. Further research should refine these models for broader clinical application and assess their potential to reduce bias in healthcare documentation.
We conducted the Clinical Registry Extraction and Data Submission (CREDS) project to evaluate the readiness of HL7 Fast Healthcare Interoperability Resources (FHIR) for provisioning data from health information systems for the American College of Cardiology Cardiac Catheterization Percutaneous Coronary Intervention (CathPCI) Registry. The CREDS project had 3 workstreams: (1) evaluation of the readiness of clinical documentation for data transforms, (2) modeling of a FHIR-based clinical workflow for registry data submission, and (3) development and demonstration of a CREDS FHIR implementation for registry data submission. Of the 344 data concepts comprising the CathPCI Registry, only 111 (32%) were sufficiently discrete to be listed in the CathPCI Data Dictionary with a terminology mapping. Cardiologist informaticians identified an additional 42 concepts suitable for provisioning via a FHIR payload. The resulting notional workflow combined FHIR-based data assembly with manual chart abstraction of compound, summative, and complex clinical concepts. A CathPCI FHIR StructureDefinition artifact was authored, incorporated into a CREDS FHIR Implementation Guide, and balloted to Standard for Trial Use status. CREDS demonstrated both potential and limitations for using FHIR for registry data submission. The largest technical impediment was the volume of code (>11 000 lines) for the FHIR StructureDefinition. Lack of regularized clinical vocabularies, reliance of registries on complex clinical concepts, and absence of FHIR infrastructure must be overcome before CREDS can be used at scale. CREDS demonstrated proof-of-concept FHIR-based provisioning of clinical data for registry submission. All artifacts are open source to inform others with similar interests.
To assess patient awareness, trust, perceived benefits, and risks of artificial intelligence (AI) in clinical care within an urban safety-net health system. We surveyed 313 patients from November 2024 to January 2025 regarding AI awareness, trust in AI-assisted decision-making, and preferences for transparency and oversight. Quantitative analyses assessed associations between AI awareness and perceived benefit; qualitative analysis identified themes influencing trust. While 84% were familiar with commercial AI, fewer than half recognized the use of AI in medical decision support. Greater AI awareness was associated with higher perceived benefit (all P < .001). Participants emphasized transparency (92%), clinician oversight (82%), and validation as critical to trust. This study provides one of the first assessments of patient perspectives on AI within a safety-net healthcare setting. Patients view clinical AI favorably but demand transparency and clinician involvement. Patient education and engagement are essential for equitable, trustworthy AI deployment.
To develop and validate machine learning (ML) models that predict probable cause of death (CoD) using structured electronic health record (EHR) data, unstructured clinical notes, and publicly available sources. This multi-institutional retrospective study was conducted across Vanderbilt University Medical Center (VUMC) and Massachusetts General Brigham (MGB), including deceased patients with encounters between October 1, 2015, and January 1, 2021, and confirmed death records. The cohort included 13 708 patients from VUMC and 34 839 from MGB.The primary outcome was underlying CoD categorized into the top 15 National Center for Health Statistics rankable causes, with others grouped as "Other." Performance was assessed using weighted area under the receiver operating characteristic curve (AUC) and F-measure. The XGBoost model using structured EHR data alone achieved weighted AUCs of 0.86 (95% CI, 0.84-0.88) at VUMC and 0.80 (95% CI, 0.79-0.80) at MGB. Adding unstructured notes improved performance, with weighted AUCs of 0.90 (95% CI, 0.88-0.93) at VUMC and 0.92 (95% CI, 0.91-0.92) at MGB. Adding publicly available data did not further improve performance. Cross-institutional validation revealed significant performance degradation. Models integrating structured and unstructured EHR data show strong within-institution performance but limited generalizability across healthcare systems, highlighting challenges related to institutional data heterogeneity. Machine learning models combining structured and unstructured EHR data accurately predict CoD within institutions but perform poorly across sites. Health-care institutions may benefit from adopting robust processes for locally tailored models, and future research should focus on enhancing model generalizability while addressing unique institutional data environments.
Patients with malignant tumors are admitted to the ICU for diverse reasons. However, the clinical utility of serum albumin as a prognostic biomarker remains unclear. Patients with malignant tumors were screened from the Medical Information Mart for Intensive Care IV (MIMIC-IV, v3.1). This study employed Kaplan-Meier curves, Cox proportional-hazards models, restricted cubic splines (RCS), receiver operating characteristic (ROC) curves, and subgroup analyses to evaluate serum albumin associated with all-cause mortality. For mortality-risk prediction, we applied machine-learning algorithms and used SHapley Additive exPlanations (SHAP) to identify the most influential predictors among critically ill cancer patients. A total of 1,739 patients with malignancy were included. The Kaplan-Meier curves showed significantly higher all-cause mortality in the hypoalbuminemia group (serum albumin < 30 g/L) than in the control group at each time point. Multivariable Cox regression models confirmed that hypoalbuminemia was independently associated with 28-day mortality (HR 1.74; 95% CI 1.34-2.27). Serum albumin exhibited a superior predictive capacity for long-term mortality (90-day and 1-year), with AUCs of 0.676 and 0.664, respectively, notably higher than those of the SOFA score (0.617 and 0.579). External validation using data from Tianjin Cancer Hospital yielded consistent results. The Machine learning model identified BUN, serum albumin, respiratory rate, heart rate, and SOFA as the top predictors for 14- and 28- day mortality. Hypoalbuminemia was independently associated with increased all-cause mortality. Serum albumin measured at ICU admission serves as a prognostic biomarker for identifying high-risk cancer patient groups.
The COVID-19 pandemic caused a major health crisis worldwide significantly impacting mental well-being. In this study, our objective is to assess the resilience of pre-pandemic depression level prediction models when applied to COVID-19 era data. We leverage advanced Machine Learning (ML) and Explainable Artificial Intelligence (XAI) techniques to identify the key factors impacting the shifts in depression levels during the pandemic. We aim to align the later identification with interventions and preparedness for future pandemics. We use, in this study, a data-driven methodology using National Health Interview Survey (NHIS) household survey data, explicitly covering the years 2019-2022. The NHIS data is used to build both the pre-pandemic (2019) and COVID-19 (2020-2022) models discussed in our comparative evaluation. Various ML techniques are supported (1) upstream, using feature selection methods to reduce both irrelevance and the high dimensionality of social-nature data, and (2) downstream, by an XAI-based approach to gain insight into the pandemic-associated phenomena that mostly impacted the mental health of individuals. In our empirical experiments, we use over 100 000 entries across the 4 yearly datasets, where we apply an 80%-20% training/testing split for models building and evaluation. The outcomes of our empirical study show that classifiers trained solely on pre-COVID-19 data performed poorly when applied to COVID-19 era data. Conversely, models retrained on pandemic-specific data demonstrated high performance. In particular, the Random Forest (RF) classifier achieved the best performance, recording an average accuracy of 98.10% across the COVID-19 era datasets. With respect to the depression key factors' identification, XAI techniques provided actionable insights, revealing that features such as Delayed Medical Care, Family Poverty, Participation in Social Activities, and Marital Status were the most influential factors contributing to depression challenges during the pandemic. The significant decline in the performance of pre-pandemic models on COVID-19 data reveals the profound impact of the pandemic on mental health, highlighting the need for new predictive models tailored to crisis circumstances. The built RF model, uses appropriate pandemic data, performed accurately during the COVID-19 era with an accuracy of 98.1%. XAI techniques confirmed that factors such as delayed medical care, family poverty, job loss, and reduced social involvement were critical drivers that impacted the decline in mental health during the pandemic.
To develop and validate a prespecified logistic model for detecting mild cognitive impairment (MCI) using MyCog Mobile, a self-administered smartphone-based screening application, and to evaluate a structured simplification that reduces patient burden while maintaining diagnostic accuracy. We analyzed data from 277 older adults (100 electronic health record-confirmed MCI; 177 normal cognitive aging). Guided by the Harrell/Regression Modeling Strategies framework, a prespecified 10-predictor model was compared against reduced models using Wald χ 2 partitioning. Internal validation used 2000 bootstrap (BS) resamples to calculate optimism-corrected C-statistics/area under the receiver operating characteristic, calibration, and clinical utility via decision curve analysis. Sensitivity analyses compared the primary model to bootstrapped and cross-validated regularized regression approaches (LASSO, Ridge, Elastic Net) to confirm model stability. The final parsimonious model included 6 predictors and achieved an optimism-corrected C-statistic of 0.812 with excellent calibration (slope = 0.92). Detection accuracy was 75% (BS 95% CI, 69%-80%), consistent with penalized regression models in sensitivity analyses (accuracy 72%-73%), with overlapping CIs confirming predictive stability. Decision curve analysis showed the model provides net benefit over both "refer-all" and "refer-none" strategies across all examined thresholds, capturing ∼55% of the net benefit achievable by a theoretically perfect screener. The final model prioritized parsimony to reduce patient burden while maintaining clinical accuracy to detect MCI. Stability across traditional regression and regularized regression approaches from the statistical learning literature indicated a robust predictive signal. Findings support MyCog Mobile as an accurate and accessible cognitive screener able to detect the earliest signs of cognitive impairment in primary care.
This study evaluates the usefulness of explicit syntactic knowledge, integrated via a neural mechanism, in improving the accuracy of named entity recognition in the domain of biomedical text processing. Syntactic structure of a text can be helpful to determine whether a certain part of the text is an entity or not. Parsing is an essential technique in natural language processing (NLP) that can be utilized to determine the syntactic structure of sentences in human languages. We propose to infuse syntactic knowledge through the attention mechanism using dependency parsing and sequence labelling parsing, as well as the multi-task learning paradigm. Experiments were conducted on five datasets: MTSamples, VAERS, NCBI-disease, BC2GM, and JNLPBA. We demonstrate improvements in the F1 score over the current state of the art on 3 out of 5 datasets (MTSamples, VAERS, and NCBI). We reduce the number of mismatches with gold labels in particular in the n-dash and parentheses tokens and in compound and adjective modifier dependencies. Syntactic features improve NER accuracy in attention-based neural systems, and parsing as sequence labelling brings additional benefits.
Research in suicide risk prediction often suffers from the lack of comprehensive data on patient suicide death, which differs from suicide attempt or suicidal behaviors. This study aimed to develop a population-wide multi-source harmonized data warehouse suitable for suicide death risk prediction. The Maryland Suicide Data Warehouse (MSDW) was conceived as a statewide database that addresses limitations in prior suicide research. To develop MSDW, multiple patient-level statewide data sources were linked using the statewide health information exchange infrastructure. Manner of death, the standard outcome in suicide death research, was determined by the state medical examiner. Health services data were linked from multiple data sources such as electronic health records, hospital discharge data, and administrative insurance claims. Data were structured as a common format that preserves observations at their lowest level of analysis. Data features were included based on known or hypothesized psychiatric or suicide risk factors. The warehouse contains a mix of records across data sources for patient diagnoses, clinical encounters, procedures, area of residence, pharmacy fills and laboratory findings. MSDW represents 104,517 decedents reported by the OCME between 2012 and 2020, 5,059 classified as suicides. The MSDW is a statewide data warehouse that allows users to conduct population health research, predictive modeling and observational studies for multiple outcomes. It has multiple overlapping clinical records that improve the completeness and timeliness of data. It is a high-quality statewide data warehouse for conducting suicide prediction research and assessing risk for surveillance and intervention.
To verify that federated genomic study sites applied identical preprocessing pipelines without disclosing raw genotypes. Each institution perturbs a 100-SNP slice using local differential privacy (LDP), trains a RandomForest classifier, and transmits one LIME explanation vector to a coordinating server. The server simulates 15 preprocessing combinations and trains a RandomForest classifier to predict each site's configuration. In centralized simulation, the verifier achieved 80% accuracy across 15 preprocessing configurations on the GMMAT (n = 400) and synthetic genome (n = 2504) datasets while maintaining membership-inference attack power below 0.05 at ε = 3. In distributed Flower FL experiments with data partitioned across three sites, binary compatibility detection reached 70% accuracy at 500 SNPs. A single differentially private explanation vector provides an auditable preprocessing fingerprint. The gap between centralized and distributed accuracy reflects expected FL data partitioning effects. This framework demonstrates the feasibility of automated preprocessing verification in federated genomic consortia without compromising participant privacy.
Implemented a tool to identify high-risk patients with atrial fibrillation (AF) with a CHA2DS2-VASc score of ≥2 (males) or ≥3 (females) who are not treated with oral anticoagulation. Aimed to evaluate the acceptability and usability of the "AF or Flutter not on Anticoagulant" electronic health record-based Care Gap (AF Care Gap) alert and associated best practice advisory (BPA) for clinicians managing patients with AF. An electronic survey was sent to 490 primary care and cardiology providers at Essentia Health (Duluth, MN, USA) to evaluate the usability, acceptability, and obtain feedback post-implementation. We excluded providers who did not complete the consent (n = 9), give consent (n = 15), complete the survey (n = 340), or see AF patients (n = 5). Survey response rate was 25% (N = 121); 51% reported prior use of the AF Care Gap (N = 62) with the majority (73%) in family medicine. Most users and nonusers reported they were "likely/extremely likely" to start a conversation about anticoagulation and use the AF Care Gap or BPA in their future practice (84%). Of those who used it, 75% of providers were "likely/extremely likely" to prescribe anticoagulation. Most users would recommend it to others (67%). On the System Usability Scale, the AF Care Gap scored 72.5/100. The acceptance was 27/35 using a modified Theoretical Framework of Acceptability questionnaire. Survey respondents report above average usability and acceptability. Future evaluation of the AF Care Gap tool utilization and persistent gaps in anticoagulation management are still needed to improve management for high-risk AF patients.