Through a partnership and collaborative design process involving technologists and healthcare professionals, a web platform featuring a digital rehabilitation plan was developed. This process tackles recognised challenges in information and communication technology for rehabilitation, such as fragmented data flows and low-quality user interfaces. The digital rehabilitation plan facilitates shared decision-making by allowing patients to customise their plan while providing healthcare professionals with the necessary access to support the patient's journey. This study assessed the experiences of end-user healthcare professionals and patients regarding the plan's usability, its potential for shared decision-making, and identified potential areas for improvement. This qualitative exploratory study employed semi-structured focus group discussions involving three distinct participant groups: (i) patients, (ii) healthcare professionals, and (iii) a combined group of patients and healthcare professionals. The discussions focused on various stages of the rehabilitation process, including the pre-admission phase, inpatient care, rehabilitation plans, evaluations during inpatient stays, and outpatient follow-ups. The recorded discussions were transcribed and analysed using reflexive thematic analysis to uncover patterns, contradictions, and dilemmas in the participants' experiences and perspectives. Patients and healthcare professionals appreciated the digital rehabilitation plan and acknowledged its contribution to shared decision-making. However, patients requested a wider range of digital support and communication tools for inpatient rehabilitation and follow-up, e.g., a "my page" with access to relevant information, chat support, educational tools, and videos. Professionals were reluctant to expand the use of the web platform due to past negative experiences with information and communication technology, and had less motivation to change work processes further. When patients and professionals were brought together, the discussion led to a shift in the professionals' opinions, revealing a potential for improved collaboration, communication, and shared decision-making. This study underscores the necessity of involving all stakeholders in the design of a digital tool to ensure that all end-users ' needs and wants are accounted for. Involving patients in both design and implementation can uncover biases, identify barriers, and enhance usability, thereby promoting digital communication and shared decision-making in rehabilitation. Bringing stakeholders together revealed that while healthcare professionals encounter barriers due to established procedures, patients' familiarity with everyday technology and enthusiasm for digital tools shifted professionals' perspectives. Lessons learned from this research project were fed back to the developer of the web platform, and taken into account to strengthen the web platform's potential for communication and shared decision-making in rehabilitation planning. This study was approved by the Norwegian Agency for Shared Services in Education and Research, Data Protection Services for Research, reference number 116434.
Routine medical data are highly valuable for secondary use, and data sharing is a prerequisite for pioneering research. Furthermore, since the advent of artificial intelligence and its application in various medical fields, such as decision-making and pharmacovigilance, the demand for real-world training data has steadily increased. However, the associated privacy risk, especially concerning reidentification, is extremely sensitive, and we are currently unaware of any standardised method to quantify it comprehensively. Assessing the reidentification risk of a data collection under examination requires the consideration and analysis of a complex system. To develop a holistic framework for stratifying this risk, an integrative approach is followed where the risk of deanonymisation is not considered mono-causally but includes various aspects. On the basis of a systematic literature review, factors and corresponding risks that are decisive in reidentification attacks are identified. These factors are grouped into overarching perspectives, and evaluation criteria are developed, facilitating the systematic grading of each risk factor by a data controller. Interactions between factors are visualised in entity‒relationship models (ERMs), and their direction and supposed magnitude are quantified in an influence matrix. Finally, on the basis of this matrix, a risk score and different indices are generated to evaluate the reidentification risk and facilitate possible countermeasures. The reidentification risk comprises four general perspectives regarding data, knowledge, potential attackers, and technical/organisational aspects. The ERMs represent a complex system of clear interconnections between the factors of the respective perspectives. The final calculation is performed in the influence matrix based on the assessment of the data controller. The derivable indices and visualisations provide indications of particularly risk-driving components of a dataset and thus for targeted safety measures, such as generalisation, suppression and randomisation approaches. Experiments to determine the functionality of the method via published and verified reidentification attacks confirm the plausibility and selectivity of risk stratification. A quantitative assessment of the reidentification risk of a medical dataset, including the identification of risk drivers, is necessary and feasible. The proposed prototype must be further evaluated and will serve as the basis for the development of a software application. The online version contains supplementary material available at 10.1186/s12911-026-03475-4.
Machine learning (ML) has emerged as a transformative approach for developing high-performance clinical prediction models (CPMs). By leveraging multidimensional patient data, ML enables more accurate disease risk stratification, prognostic assessment, and clinical decision-making. In recent years, research on CPMs has expanded rapidly, with nearly 250,000 publications indexed as of 2024. Despite this remarkable growth, a comprehensive bibliometric analysis of the field is currently lacking. This study aimed to analyze the global research status, evolutionary trends, and thematic hotspots of machine learning-based clinical prediction models (ML-CPMs) through bibliometric and visualization techniques. Publications related to ML-CPMs were retrieved from the Web of Science Core Collection and the Scopus database (up to May 9, 2025). Bibliometric analyses were performed using various tools, including R, VOSviewer, and CiteSpace, to generate annual publication trends, collaboration networks, and journal distributions, as well as co-citation, clustering, and keyword analyses. A total of 8,619 publications (8,000 original articles and 619 reviews) from 118 countries were identified. Since 2015, annual publications have grown exponentially (R² = 0.9919). While China led in total publication volume, the United States maintained the highest academic influence (H-index = 105; Total Citations = 66,788). Harvard University and BMC Medical Informatics and Decision Making emerged as the most productive institution and journal, respectively. Tian J from the Chinese Academy of Sciences led in publication count, while Wynants L from KU Leuven in Belgium recorded the highest citation frequency. Key research hotspots include algorithm optimization, multimodal data integration, and model interpretability, with clinical applications primarily focused on oncology, cardiovascular diseases, and critical care medicine. Research on ML-CPMs has experienced rapid global growth over the past decade, forming extensive international collaboration networks. However, challenges such as limited interpretability, data heterogeneity, and privacy concerns persist. Future studies should prioritize external validation, clinical applicability, and the integration of human-AI collaborative decision-making to ensure robust implementation in real-world clinical settings.
Knee osteoarthritis (KOA) is a highly prevalent chronic condition that substantially impairs functional capacity and quality of life among middle-aged and older adults. Sensory loss, including hearing and vision loss, is another major health concern in aging populations. Dual sensory loss (DSL), the coexistence of visual and auditory impairment, leads to more severe clinical consequences than single sensory deficits, largely due to disrupted sensory integration and diminished neural compensatory mechanisms. Emerging evidence indicates that osteoarthritis is linked to progressive deterioration of auditory and visual function, highlighting the need for early identification of individuals at risk. Therefore, this study aimed to develop a machine learning-based time-to-event prediction model for DSL among middle-aged patients with symptomatic KOA and to externally validate its performance in an independent hospital-based cohort, then identify its risk factors through interpretable analysis, providing essential evidence to support early preventive interventions. Data from the China Health and Retirement Longitudinal Study (N = 605) were utilized in model development phase. After data preprocessing steps, we trained and tested four time-to-event ML algorithms. Model performance was evaluated in 10-fold cross-validation by using the concordance index (C-index), Brier scores and calibration plots. A sensitivity analysis was conducted by redefining DSL using a broader cutoff and re-training all models under the same cross-validation framework to assess the robustness and stability of the finding. An independent hospital-based cohort (N = 195) was used for preliminary external validation. The optimal model was further evaluated by the decision curve analysis (DCA) to assess its clinical utility and interpreted with SHapley Additive exPlanations (SHAP) to quantify feature contributions and directional effects. 15 variables with the highest predictive capacity were retained. The DeepSurv model demonstrated superior performance in both the construction and validation phase, achieving C-index exceeding 0.8, with a Brier score below 0.1. The sensitivity analyses results were largely consistent with our primary findings, supporting the robustness of the associations. SHAP analysis revealed self-rated health and sleep duration as the most important predictors, both negatively influencing DSL risk. The DeepSurv model effectively predicts time-to-event risk of DSL in KOA patients, highlighting subjective health perception and sleep duration as critical modifiable factors. These findings support the development of targeted early preventive strategies in clinical practice to preserve sensory function and reduce the long-term disease burden associated with KOA.
Early identification of patients at risk for heart failure (HF) hospitalization in the emergency department (ED) is challenging because definitive diagnostic tests are often unavailable at triage. Chief complaint narratives contain rich symptom information but are rarely leveraged for early risk stratification. We sought to develop and validate a machine learning model that predicts HF hospitalization using only data available at ED intake, including free-text chief complaints and structured triage variables. We conducted a retrospective cohort study of 270,596 adult ED-to-inpatient encounters across a large integrated health system (2016-2021). The primary outcome was HF hospitalization, defined by a primary discharge diagnosis of HF. Predictors were limited to triage-available data: demographics, vital signs, comorbidity burden, and free-text chief complaints. Chief complaint text was transformed using term frequency-inverse document frequency and latent semantic analysis, supplemented by clinically defined symptom phenotypes. Logistic regression and light gradient boosting machine (LGBM) models were trained and evaluated on a held-out test set. Model performance was assessed using discrimination, calibration, and precision-oriented thresholds. HF hospitalization occurred in 7.5% of encounters. Models incorporating both structured and natural language processing-derived features achieved the highest performance. The combined LGBM model demonstrated strong discrimination (AUC = 0.896), recall (0.816), and precision (0.630 at the default threshold), outperforming structured-only and NLP-only models. Symptom clusters related to dyspnea and edema were among the strongest predictors. HF hospitalization can be accurately predicted at ED presentation using only triage-available data. Integrating free-text chief complaints with structured variables substantially improves early risk stratification and may support earlier diagnostic evaluation and resource planning in acute care settings.
With the introduction of data protection regulations, the need for innovative privacy-preserving approaches to process and analyse sensitive data has become apparent. One approach is the Personal Health Train (PHT) that brings analysis code to the data and conducts the data processing at the data premises. However, despite its demonstrated success in various studies, the execution of external code in sensitive environments, such as hospitals, introduces new research challenges because the interactions of the code with sensitive data are often incomprehensible and lack transparency. Such interactions introduce potential threats to data integrity and expand the attack surface, exposing the system to risks including code injection, supply chain software vulnerabilities, and unauthorised runtime network communication. To address this issue, this work discusses a Personal Health Train (PHT)-aligned security and audit pipeline inspired by DevSecOps principles, called Pipeline for Automated Security and Technical Audits for the Personal Health Train (PASTA-4-PHT). The automated pipeline incorporates multiple phases that detect vulnerabilities, such as unintentionally or intentionally introduced weaknesses in the code of the PHT, before its deployment. To thoroughly study its versatility, we evaluate PASTA-4-PHT in two ways. First, we deliberately introduce vulnerabilities into a PHT. Second, we apply our pipeline to five real-world PHTs, which have been utilised in real-world studies, to audit them for potential vulnerabilities. The controlled evaluation confirmed detection of all injected vulnerability types showing that the audit pipeline is effective. In the real-world audit of five Trains, the image analysis phase identified up to 35 critical vulnerabilities per Train, indicating that container images pose the most significant threat vector according to our evaluation. Our evaluation demonstrates that our designed pipeline successfully identifies potential vulnerabilities and can be applied to real-world studies. In compliance with the requirements of the General Data Protection Regulation (GDPR) for data management, documentation, and protection, our automated approach supports researchers using the PHT in their data-intensive work and reduces manual overhead. PASTA-4-PHT can be used as a decision-making tool to assess and document potential vulnerabilities in code for data processing. The associated artefacts of this article, along with the pipeline configuration, are available online for adaptation and reuse. Ultimately, our work contributes to an increased security and overall transparency of data processing activities within the PHT framework.
To develop and validate a machine learning (ML) model to assess the risk of chronic critical illness (CCI) in intensive care unit (ICU) patients with acute pancreatitis (AP). We utilised two large, publicly available ICU datasets, MIMIC-IV (v3.1) and the eICU Collaborative Research Database (v2.0), as the development cohort for model construction. A single-centre dataset from China (SZICU) was used for external validation. Three feature selection methods-stepwise regression, Least Absolute Shrinkage and Selection Operator (LASSO), and the Boruta algorithm-were employed. Three ML methods-logistic regression (LR), random forest (RF), and extreme gradient boosting (XGBoost)-were used for model development. Model performance was evaluated using the area under the receiver operating characteristic curve (AUROC), area under the precision-recall curve (AUPRC), accuracy, F1 score, sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and Brier score, in both internal and external validation. The incidences of CCI were 7.00%, 9.89%, and 20.09% in the training, internal validation, and external validation sets, respectively. Eight predictors of CCI were identified: calcium level, body temperature, vasopressor use, urine output, Glasgow Coma Scale score, albumin level, haemoglobin level, and a history of cerebrovascular disease. In the internal validation set, the RF model achieved an AUROC of 0.85 (0.77-0.91), an AUPRC of 0.53 (0.39-0.69), and a Brier score of 0.07 (0.05-0.09). In the external validation set, the RF model achieved an AUROC of 0.73 (0.64-0.81), an AUPRC of 0.42 (0.30-0.56), and a Brier score of 0.16 (0.12-0.20). Feature importance analysis revealed that calcium level, body temperature, vasopressor use, and urine output were the most influential predictors of CCI. We developed and validated an ML model using eight clinical variables to predict CCI risk in ICU patients with AP.
Healthcare stakeholders are increasingly seeking comparative provider performance data to enhance data-driven decision-making and quality improvement. Traditional visualisations, like caterpillar plots, are often difficult for end users to understand and interpret. This study aimed to (1) obtain general feedback from end users on a newly proposed design solution for visualising a risk-adjusted hospital comparison and to develop an understanding of the key criteria they rely on in the evaluation process; (2) test the hypothesis that end users will better understand key messages and rate perceived usability higher with the new design solution than with a caterpillar plot. An end user-centred mixed methods study, involving end users of risk-adjusted hospital comparisons across all levels of the Swiss healthcare system, was conducted to evaluate the new design solution. In the qualitative phase, 14 end users from health authorities, insurers, hospital associations, and hospitals were surveyed in 10 semi-structured individual and group interviews, which were analysed using thematic analysis. In the quantitative phase, a non-clinical randomised controlled online trial (A/B testing) was conducted. In total, 200 of the targeted end users, comprising cantonal quality managers, hospital directors, and those responsible for quality and/or the 'National Prevalence Measurement' in hospitals, completed the questionnaire. The data were analysed using comparative descriptive and bivariate statistics. Thematic analysis revealed three key criteria that end users relied on when evaluating a risk-adjusted hospital comparison: (1) 'clarity by design', highlighting strategies for effectively conveying key messages of hospital comparisons; (2) 'usability by design', focusing on end user-centred functionalities and presentation elements; (3) 'suitability for quality development', addressing the conditions for creating a trustworthy and useful comparison to drive quality improvement. Quantitative analysis confirmed the hypothesis that end users understand key messages better and perceived usability is higher with the new design than with the caterpillar plot. The new design solution improves hospital comparison outputs for end users by combining clear displays with additional interactive features. The identified criteria underlying the evaluation should inform further design projects and research dealing with the visualisation of hospital comparisons. Not applicable.
Linking electronic health record (EHR) use to care quality may offer insights into potential interventions improving guideline adherence and closing care gaps. We examine how EHR metadata can measure cognitive load in primary care providers during statin prescribing and identify cognitive load points in EHR workflows associated with guideline-concordant statin initiation. We retrospectively extracted 2024 data from EHR primary care encounters from a large academic health system. We identified adult patients who met the criteria for statin initiation and calculated their atherosclerotic cardiovascular disease (ASCVD) risk scores. Cognitive load metrics were derived from EHR metadata. Logistic regressions evaluate associations between cognitive load and statin initiation, adjusting for patient covariates and provider fixed effects. Gradient-boosted forests and Shapley Additive explanations (SHAP) values were used to identify key EHR events and cognitive load patterns associated with statin initiation. Longer encounter duration was associated with increased likelihood of statin initiation, whereas more time spent per EHR event was associated with a decreased likelihood. Nonlinear associations were observed for loop count and distinct event count: predicted initiation probability decreased with increasing loop count to 93.9 loops, then increased beyond this threshold. For distinct events, initiation probability increased up to approximately 18 events and declined at higher counts. In a gradient-boosted decision tree model, average event time was the strongest predictor (72.2% relative contribution). Additional positive predictors included time spent reviewing lab results and on suggested medication order sets. Order list modification and looping back to it were negatively associated with statin initiation. EHR metadata can associate cognitive load with appropriate clinical behavior, revealing nonlinear associations between cognitive load and statin initiation rates. This work suggests opportunities to optimize EHR systems to reduce cognitive burden and support clinical decision-making. Connecting cognitive load to prescribing behavior generates hypotheses about how workflow adjustments and enhanced decision support might improve guideline adherence and patient care through prospective evaluation.
The internet has become an important source of information for cancer patients. Numerous websites provide nutritional advice that promises benefits for the outcome of cancer therapy. The aim of our study was to evaluate and compare the online information about cancer diets on German- and English-language websites. A patient's online search was simulated using the search engines Google and Bing. Websites were evaluated by means of content and formal criteria according to a standardized instrument. The analysis of 31 websites revealed heterogeneous quality regarding content and formality, distributed evenly among the German- and English-language websites. The quality of content and formality does not correlate with the website's order of appearance in a browser-based search. The high discrepancy in quality of content and formality represents a risk for cancer patients, who are searching for information online. Content of poor quality and formality increases the risk of mal-information and consecutive false decisions on diet. It results in the decline of therapy response, an increased probability of therapeutic toxicity and a poorer prognosis in general. The visibility of high-quality websites needs to be improved.
Chronic conditions cause millions of deaths annually worldwide. Remote patient monitoring using wearable devices and sensors, combined with machine learning (ML), offers promising strategies for disease management. However, diverse methodological approaches and study designs impede comparability and the development of best practice guidelines. A systematic review was conducted following the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) guidelines. Four scientific databases were searched for relevant prospective studies published between 2014 and 2024. Studies had to use ML to predict disease outcomes of chronic conditions in remotely monitored patients. The studies were tagged for characteristics such as health outcomes, dataset, monitored parameters, and algorithms. From 6668 initially identified studies, 76 met inclusion criteria. 73.7% of studies were considered to have a high risk of bias, mainly due to methodological shortcomings in the Analysis domain. Parkinson’s disease was most frequently monitored, followed by diabetes and chronic obstructive pulmonary disease (COPD). Wearable devices were the predominant remote sensors, with accelerometer data being the most common parameter. Tree-based algorithms were most frequent, and studies using leave-one-out cross-validation showed significantly higher accuracy. Feature engineering and publication year were also significantly associated with model performance. This review highlights both progress and challenges in applying ML to chronic disease monitoring. While conditions like Parkinson’s, COPD, and diabetes are well-represented, others such as liver and kidney diseases are underexplored. Future research should prioritize standardization of methodologies, model interpretability, and ethical considerations including data privacy and algorithmic fairness. When properly implemented, ML-driven remote monitoring has the potential to enhance patient care, reduce complications, and deepen our understanding of chronic conditions. However, addressing challenges in reproducibility, generalizability, and clinical integration is crucial for advancing the field.
Few studies have evaluated peripheral artery disease (PAD) and wound healing in patients with lower extremity wounds using a convolutional neural network (CNN)-based deep learning algorithm. We aimed to establish a CNN deep-learning model based on transcutaneous oxygen pressure (TcPO2)-annotated wound images for detecting PAD and wound healing in diabetic patients with lower extremity wounds. An extensive database of 1,407 original images from 77 patients with lower extremity wounds were collected to produce CNN deep-learning models (i.e., GoogleNet, ResNet 101V2 and EfficientNet). A framework was constructed, including image pre-processing and TcPO2-based grouping, to establish an optimal training model and to validate each model's performance for detecting PAD or wound healing. In the established CNN deep-learning models, the ResNet101V2 model with original wound images showed the best performance for detecting PAD (sensitivity 93.08%, accuracy 86.20%) or wound healing (sensitivity 96.76%, accuracy 88.14%), although the performance of GoogleNet and EfficientNet models also demonstrated high sensitivity and accuracy. CNN deep-learning algorithm based on objective TcPO2 values and image preprocessing is a promising model for detecting PAD and wound healing for lower extremity wounds, providing an easily implemented and more objective and reliable computation tool for physicians to automatically identify PAD and monitor wound healing.
Heart rate variability (HRV) derived from electrocardiogram (ECG) signals offers a promising non-invasive window into glycemic status; however, existing studies frequently combine distinct glucose measurements and employ validation strategies susceptible to data leakage. Because HRV declines by approximately 3-5% per decade due to age-related autonomic degeneration, absolute HRV values conflate the effects of aging with diabetes-specific autonomic dysfunction. We hypothesised that normalising HRV features using an age-dependent scaling factor would isolate the diabetes-specific component and improve glycemic status estimation. We analysed ECG-derived features from 43 male type 2 diabetes patients with strictly separated glycated hemoglobin (HbA1c; n = 29; 3-month glycemic average) and fasting blood glucose (FBG; n = 38; acute status). Leave-one-subject-out (LOSO) cross-validation (CV) with within-fold feature selection and standardisation prevented information leakage. Twenty machine learning algorithms and six age-adjustment methods were compared, with normalisation sensitivity tested across 20 parameter combinations. Statistical validation employed permutation testing (n = 500) and bootstrap 95% confidence intervals. Extra trees regression achieved the best performance: R² = 0.222 (r = 0.476, p = 0.009) for HbA1c and R² = 0.086 (r = 0.344, p = 0.034) for FBG, corresponding to mean absolute errors of 1.18% points and 2.27 mmol/L respectively. Permutation testing confirmed that both associations exceeded the chance level (p = 0.002). Contrary to our hypothesis, none of the six age-adjustment methods nor any of the 20 sensitivity parameter combinations improved performance, indicating that age-related HRV decline did not confound glycemic estimation in this cohort. CV hygiene differentially affected model families: tree-based ensembles maintained positive performance, whereas linear models collapsed to negative R² values, revealing substantial bias from conventional practices. Neural networks with minimally configured hyperparameters failed for these sample sizes (R² ranging from - 8.2 to - 10,879). Strict within-fold preprocessing fundamentally alters conclusions in HRV-based glycemic status estimation, exposing inflated performance to conventional CV practices. Bootstrap confidence intervals excluding zero (HbA1c R²: [0.13, 0.82]; FBG R²: [0.10, 0.72]) provided statistical evidence for genuine HRV-glycemic associations, but performance remained insufficient for standalone clinical use. This study establishes methodological standards for separating glycemic targets, subject-independent validation with within-fold preprocessing, and comprehensive baselines to advance non-invasive glycemic monitoring research. Not Applicable.
暂无摘要(点击查看详情)
Respiratory epidemics often place substantial pressure on intensive care units (ICU), which are continuously challenged to managing acute and life-threatening conditions under unpredictable workloads. During these periods, ICUs usually exhibit inefficient patient flows, treatment delays, and critical resource shortages. Proactive decision-making and precise interventions are therefore pivotal for patient survival and minimizing long-term sequelae. This paper proposes a robust approach combining Artificial Intelligence (AI), Bayesian Optimization, and Digital Twin (DT) to support ICU patient flow management. An eXtreme Gradient Boosting (XGBoost) algorithm is used to predict the patient transfer probability from the emergency department (ED) to the ICU within the next 24 h. Bayesian optimization is employed for efficient hyperparameter tuning of the XGBoost model. Then, the transfer predictions are inserted into a DT to verify ICU capacity for timely care and design interventions for process mismatches. A case study from a European healthcare group validates the proposed approach. The specificity of the prediction XGBoost model was 94.90% (CI 95% 91.72% - 97.11%), whereas the sensitivity was 81.55% (CI 95% 72.70% - 88.51%). Finally, the median ICU bed waiting time decreased to between 66.74 and 69.38 h after implementing a patient transfer policy with a partner hospital having available ICU beds. This study demonstrates the effectiveness of AI-DT in predicting the probability of ICU transfers, assessing the operational response of emergency wards and intensive care units, and crafting practical scenarios for enhancing patient flow management.
暂无摘要(点击查看详情)
Postoperative nausea and vomiting (PONV) prolongs hospitalization and reduces patient satisfaction. Identifying high-risk elderly patients requires accurate absolute risk assessments, yet existing tools often lack probability calibration and transparency. We included 1216 elderly patients undergoing elective hip or knee surgery. To strictly prevent data leakage, the dataset was partitioned into training, validation, and independent test sets in a 7:1:2 ratio prior to any imputation or feature selection. Following the systematic hyperparameter optimization of 12 distinct machine learning algorithms, a StackNet meta-model was developed by fusing optimal base-learner probabilities with raw clinical features. Clinical utility was evaluated via Brier scores and Decision Curve Analysis (DCA), alongside SHapley Additive exPlanations (SHAP) interpretability. Overall PONV incidence was 33%. The StackNet model achieved an AUC of 0.9338, significantly outperforming the conventional Logistic Regression baseline (AUC = 0.7564, p < 0.001) with superior calibration (Brier score = 0.102). On the independent test set, the StackNet model achieved an accuracy of 0.7860, sensitivity of 0.9250, specificity of 0.7178, and AUC of 0.9338, while the Logistic Regression baseline achieved an accuracy of 0.6584, sensitivity of 0.6750, specificity of 0.6503, and AUC of 0.7564. SHAP analysis identified preoperative frailty status and baseline hemoglobin levels as primary risk drivers. The StackNet framework offers highly calibrated absolute risk estimates for PONV in elderly orthopedic patients. Combined with SHAP transparency, it provides a clinically actionable tool to facilitate personalized antiemetic prophylaxis while avoiding unnecessary medical interventions due to overestimated risks.
Atrial fibrillation (AF) is a major risk factor for atherothrombotic complications but is often asymptomatic and undiagnosed. This study aimed to develop a machine learning model to distinguish between individuals with low and high risk of AF, using routinely collected diagnostic data from Swedish primary health care. Cases (n = 42,607, aged ≥ 45 years) with diagnosed new onset AF and controls (n = 427,169) matched by age and sex. Machine learning models stratified for age (45–69 and ≥ 70 years) and sex were developed using stochastic gradient boosting, based on number of primary health care visits during the year before the index AF diagnosis, age, and ICD-10 codes from electronic medical records 2014–2019. Performance was evaluated by AUC, sensitivity and specificity, and key predictors ranked by normalized relative influence (NRI) and odds ratios for marginal effects. The most influential predictors were the number of visits (NRI: 29.9–46.3%) and age (NRI: 6.2–15.9%), followed by risk factors for AF such as heart failure, hypertension, and cardiac arrhythmias. Model AUC ranged from 0.77 to 0.79 across subgroups. Sensitivity was 0.76–0.80, and specificity 0.58–0.66, with higher sensitivity in older groups and higher specificity in younger ones. The models correctly identified 95–98% of individuals without known AF. The models show good predictive ability, effectively ruling out low-risk patients while identifying known risk factors. With AUC values comparable to more complex models, our approach using only visit frequency, age, and diagnoses may support initial risk assessment in primary health care for identifying individuals at risk of AF. The online version contains supplementary material available at 10.1186/s12911-026-03491-4.
To evaluate the predictive utility of the initial lactate-to-albumin ratio (LAR) measured within 24 h of admission for in-hospital all-cause mortality in critically ill patients with congestive heart failure (CHF) and diabetes mellitus (DM). A retrospective cohort study was performed using the Medical Information Mart for Intensive Care IV (MIMIC-IV; n = 960) and the eICU Collaborative Research Database (eICU-CRD; n = 1,850). Kaplan-Meier curves, Cox regression, restricted cubic splines (RCS), subgroup analyses, and five machine learning models were applied, with predictive performance assessed via receiver operating characteristic (ROC), calibration curves, and decision curve analysis (DCA). The highest LAR quartile (Q4) was associated with higher in-hospital mortality (MIMIC-IV: 50.83%; eICU-CRD: 29.71%) than lower quartiles (all P < 0.001). LAR was identified as an independent predictor of in-hospital mortality (MIMIC-IV: HR = 1.878, P = 0.009; eICU-CRD: HR = 3.141, P < 0.001). A nonlinear positive association between LAR and in-hospital mortality was demonstrated by RCS (P < 0.001), with inflection points at 2.73 in MIMIC-IV and 2.50 in eICU-CRD. For both outcomes, higher discriminative performance was observed for LAR than for lactate alone in both cohorts. Model performance was further improved when incorporating into machine learning models. Initial LAR is a reliable predictor of in-hospital mortality in critically ill CHF-DM patients.