Traditional in-person handovers from the Post-Anaesthesia Care Unit (PACU) to inpatient wards often require PACU nurses to leave their post-operative patients for extended periods, which may delay the handover and management of other patients awaiting transfer. Alternatively, the PACU nurse-in-charge may hand over patient information to another individual to assist with in-person handover in the inpatient ward. Multiple care transitions increase the risk of omission of critical handover information, which compromises patient safety. A Virtual Nursing Handover (VNH) model may address these workflow and safety challenges by enabling direct, real-time communication between PACU and ward nurses, while reducing unnecessary staff movement. To evaluate the safety, feasibility, and usability of a VNH from the PACU to the inpatient ward. This process innovation study was underpinned by the Technology Acceptance Model (TAM). A three-component evaluation was conducted to assess the safety, feasibility, and usability of a VNH from the PACU to the inpatient ward. The study was undertaken in an acute care hospital in Singapore between October and November 2024. Handover safety was evaluated using structured audits based on the Queensland Health Handover Audit Tool, aligned with National Safety and Quality Health Service standards. The tool comprised 17 items assessing handover completeness and safety, along with additional open-ended questions to capture technical quality and notable observations. Feasibility was assessed using implementation metrics, including handover duration, adherence to the intended workflow, and technical reliability. Usability was evaluated using the System Usability Scale (SUS) administered to PACU and ward nurses, supplemented by qualitative feedback. Consistent with the TAM, safety and feasibility were interpreted as indicators of perceived usefulness, while usability reflected perceived ease of use. Quantitative data were analysed using SPSS version 26.0, and qualitative data were analysed thematically. A total of 31 handover safety audits were conducted. Adherence to handover safety standards was high, with all audits meeting at least 15 of the 17 audit criteria (≥ 88%). The VNH demonstrated high feasibility, resulting in an estimated time saving of approximately 4.7 h per day for PACU nurses. No major technical failures were observed, and handovers were conducted as intended. In contrast, system usability was rated below average, with a mean SUS score of 41.9. VNH from the PACU to the inpatient ward was safe and feasible in routine clinical practice; however, suboptimal usability may limit technology acceptance. Further system refinement is required to improve user experience and support sustained adoption.
Myocardial infarction (MI) is a leading cause of cardiovascular mortality worldwide and requires timely treatment and accurate public awareness of risk factors, warning signs, and first aid. In China, short-video platforms such as TikTok (Douyin, Chinese mainland version) and Bilibili have become major health information sources, yet the quality and reliability of MI-related content remain inadequately evaluated. This cross-sectional study systematically assessed and compared the quality, reliability, and educational value of MI-related videos on TikTok and Bilibili. Using the keyword "" (myocardial infarction), we retrieved the top 100 videos from TikTok and Bilibili on September 1, 2025. After exclusions, 137 videos were included. Uploaders were classified as clinicians, patients, or traditional Chinese medicine practitioners. Quality was evaluated using GQS, mDISCERN, JAMA benchmarks, and PEMAT-U/A. Statistical analyses included Spearman correlation, Mann-Whitney U, Kruskal-Wallis, and chi-square tests. Bilibili videos were significantly longer but had much lower engagement than TikTok videos. Only mDISCERN scores differed significantly between platforms: Bilibili contained a higher proportion of high-reliability videos than TikTok, while JAMA, GQS, and PEMAT-U/A scores did not differ significantly. Uploader background significantly influenced quality outcomes. Clinicians and TCM practitioners achieved higher JAMA scores than patients, indicating greater formal credibility, whereas patients had a higher proportion of high mDISCERN scores, reflecting more detailed experiential content. Correlation analysis revealed a bidirectional effect of video length: longer duration was positively associated with mDISCERN and GQS scores but negatively associated with JAMA scores. Interaction metrics showed strong internal synergy but almost no correlation with professional quality scores, demonstrating a clear "quality popularity paradox."Content analysis showed an imbalanced pattern: particularly regarding emergency measures and medication safety information were severely lacking. MI-related content on Chinese short-video platforms is of moderate quality but characterized by a significant disconnect between popularity and educational value, as well as critical deficiencies in emergency response information. These findings underscore the urgent need for coordinated interventions, including platform-level quality control, collaborative content creation between professionals and platform, and enhanced public health literacy to ensure the safe and effective use of these platforms for health education.
Objective: The objective of this study is to conduct a systematic review of the evidence on the use of remotely administered walking tests (RaWTs) in patients with chronic pulmonary diseases (CPDs) and heart failure (HF), focusing on agreement, reliability, feasibility, and clinical utility as outcomes. Methods: This study followed the Preferred Reporting Items for Systematic Reviews and was registered on the International Prospective Register of Systematic Reviews platform (ID: CRD420251180996). The PubMed, Web of Science, CENTRAL, Scopus, and ACM databases were comprehensively searched from inception up to October 2025. Observational, randomized and non-randomized control studies assessing the agreement, reliability, feasibility, and clinical utility of RaWTs in people with CPDs and HF and reporting quantitative outcomes were eligible. Two reviewers independently conducted study selection, data extraction, and risk of bias assessment using the COSMIN Risk of Bias tool for the reliability studies, the Risk of Bias in Non-Randomized Studies-of Interventions (ROBINS-I) tool for non-randomized studies, and the Quality in Prognosis Studies (QUIPS) tool for the prognostic studies. Results: Eleven studies met the inclusion criteria. Five studies included patients with HF, five with pulmonary hypertension (PH), and one study included candidates for lung transplantation due to advanced CPD. All studies used the 6 min walk test (6MWT); one also included the incremental shuttle walk test. Agreement with face-to-face in-clinic testing (in five studies) is setting-dependent and influenced by the testing setup. Reliability (in eight studies), derived from variable statistical indices in both patient populations, showed that RaWTs are reliable. Adherence and safety were used as the main feasibility indicators. Eight studies concluded that remote assessment is feasible, acceptable, and safe. Clinical utility was examined in only one HF study, showing that remotely administered 6MWT can predict all-cause mortality and HF hospitalization. According to COSMIN, the overall methodological quality of nine studies ranged from very good to inadequate. One study was rated as having a serious risk of bias according to ROBINS-I, and one study as having a high risk of bias according to QUIPS. Conclusions: Although the evidence is limited and heterogeneous, RaWTs demonstrate robust reliability across repeated measurements while agreement with in-clinic testing is context-dependent and strongly influenced by test setup and environmental conditions. RaWTs appear to be acceptable to patients; however, further high-quality studies are needed to confirm these findings and determine the clinical utility of RaWTs on specific clinical outcomes in these populations.
Attention-Deficit/Hyperactivity Disorder (ADHD) is a complex neurodevelopmental disorder requiring professional diagnosis. Recently, short-video platforms such as TikTok and Bilibili have seen a surge in ADHD-related content, driving a trend of self-diagnosis among the public, particularly young adults. The scientific quality and potential risks of this content have not been systematically evaluated. This study aimed to systematically evaluate the quality and reliability of ADHD content on TikTok and Bilibili, analyze its content characteristics, and specifically investigate the prevalence of content encouraging self-diagnosis and its association with user engagement. The top 100 videos from each platform were retrieved using the keywords "ADHD" and "." After a screening process, a total of 164 videos were included for analysis. Two senior clinical psychologists independently assessed the videos using the modified DISCERN (mDISCERN) tool and the Global Quality Score (GQS). Videos were classified by uploader type (e.g., healthcare professionals, patients/influencers) and content theme (e.g., symptom education, self-tests). A novel Self-Diagnosis Risk Scale (SDRS) was also applied. Nonparametric statistical methods were used for data analysis. A total of 164 videos were analyzed (88 from TikTok, 76 from Bilibili). Significant platform differences emerged, with Bilibili videos demonstrating superior quality scores (GQS: 3.05 ± 0.91 vs. 2.45 ± 0.88; mDISCERN: 2.62 ± 0.85 vs. 1.88 ± 0.72; both p < 0.001) but TikTok videos showing higher self-diagnosis risk (SDRS: 1.71 ± 0.51 vs. 1.30 ± 0.69; p < 0.001). Healthcare professionals produced the highest quality content (GQS: 3.65 ± 0.68; mDISCERN: 3.15 ± 0.81) with lowest diagnostic risk (SDRS: 0.75 ± 0.49), while patients/influencers created content with the lowest quality and highest risk scores. Critically, a "quality-engagement paradox" was identified: videos with higher self-diagnosis risk received significantly more user engagement (likes: r = 0.45, p < 0.001; shares: r = 0.42, p < 0.001), while quality metrics showed no significant correlation with user engagement measures. This study reveals concerning patterns in ADHD-related content on major Chinese short-video platforms, where potentially harmful content encouraging self-diagnosis receives preferential algorithmic promotion over scientifically rigorous material. The inverse relationship between content quality and user engagement suggests current platform mechanisms may inadvertently amplify misleading health information while marginalizing evidence-based content. These findings underscore the urgent need for collaborative interventions involving platform operators, healthcare professionals, and public health educators to develop content guidelines, improve algorithmic curation of health information, and support healthcare professionals in creating engaging, evidence-based content. As social media platforms continue serving as primary health information sources, ensuring quality and safety of mental health content must become a priority for platform governance and public health policy.
Congenital cataract (CC) is a time-critical cause of preventable childhood visual impairment. After diagnosis, parents frequently experience uncertainty and increasingly seek guidance online. The safety, readability, and counseling quality of large language models (LLMs) responses for CC remain insufficiently benchmarked, particularly for explanations involving lens development, etiology, and genetic risk. We performed a cross-sectional comparative evaluation of five publicly accessible Chinese conversational LLMs (ChatGPT-5.2, Gemini 3 Pro, DeepSeek-V3.1, Doubao, and Kimi K2). Thirty standardized parent-facing CC questions were developed by senior ophthalmologists and mapped to five domains, with specific incorporation of scenarios requiring translation of lens developmental pathology and genetic counseling knowledge. Two researchers independently performed standardized zero-shot querying and response recording under identical conditions. Output efficiency and textual structure were extracted. Two blinded ophthalmologists rated each response on a 5-point Likert scale across Accuracy, Logic, Coherence, Safety, and Content Accessibility; inter-rater agreement was assessed using quadratic weighted Cohen's kappa. Group differences were tested using ANOVA or Kruskal-Wallis H tests with Bonferroni-corrected pairwise comparisons. Significant between-model differences were observed in output efficiency and text characteristics (all P < 0.001). ChatGPT-5.2 was fastest (17.94 ± 5.11), whereas DeepSeek-V3.1 and Kimi K2 were slowest (41.46 ± 3.22 and 40.02 ± 4.67). DeepSeek-V3.1 generated the longest responses (1,456.93 ± 224.99 words) and Kimi K2 the shortest (640.83 ± 252.95). ChatGPT-5.2 showed the strongest tendency toward structured/tabular output [2.00 (1.00, 2.00)] followed by Gemini 3 Pro [1.00 (1.00, 1.25)], while the other models rarely produced tables. Quadratic weighted Cohen's kappa indicated good inter-rater reliability (0.686-0.767). Content quality differed significantly across models (Accuracy H = 41.15, Logic H = 32.95, Content accessibility H = 41.33; all P < 0.001). ChatGPT-5.2 and Gemini 3 Pro achieved higher overall profiles and did not differ significantly from each other, whereas Kimi K2 scored lower on multiple dimensions. LLM performance in translating lens developmental pathology and genetics for CC parent counseling is model-dependent. Longer outputs did not necessarily translate into higher quality; structured presentation was more closely associated with better safety and accessibility. These findings provide quantitative benchmarks for safer, parent-centered deployment of LLMs in pediatric ophthalmology education and support more reliable translation of complex disease-related knowledge into actionable parent guidance.
Large language models (LLMs), a core technology of generative artificial intelligence (AI), are increasingly used in health education and promotion. Although they may expand access to medical information, concerns remain regarding the reliability and readability of AI generated content for the public. This study evaluated the reliability and readability of answers generated by five LLMs to common questions about perinatal depression. The primary aims were to determine (1) the reliability of LLM responses to frequently asked questions about perinatal depression and (2) whether the readability of the generated content aligns with public health literacy levels. Twenty-seven frequently asked questions were derived from Google Trends and patient facing resources from the American College of Obstetricians and Gynecologists (ACOG). Each question was submitted to ChatGPT-5, Gemini-2.5, Microsoft Copilot, Grok4, and DeepSeek. Two obstetricians independently rated responses using five validated instruments (DISCERN, EQIP, JAMA, GQS, and HONCODE) and inter-rater agreement was quantified using the interclass correlation coefficient (ICC). Readability was assessed using six indices: ARI, GFI, CLI, OLWF, LWGLF, and FRF. Differences among models were analyzed using the Friedman test. Inter rater agreement was high across 27 perinatal depression questions. ICC values ranged from 0.729 to 0.847. Significant between model differences emerged for DISCERN, EQIP, and HONCODE. All had p less than 0.001. No overall differences were found for JAMA and GQS. Grok4 scored highest on DISCERN at 60.33 ± 5.48. DeepSeek scored highest on EQIP at 53.04 ± 4.91. Copilot scored highest on HONCODE at 9.26 ± 1.85. These results highlight distinct strengths in quality constructs across instruments. Readability posed a common limitation. All models exceeded the NIH recommended sixth grade level on grade-based indices (for example, ARI ranged from 13.49 ± 2.92 to 15.81 ± 3.25). Similarly, OLWF scores fell well below the sixth-grade benchmark of 94 (ranging from 61.44 ± 6.80 to 72.96 ± 10.39, where higher scores denote easier reading). Most models produced empathetic and informative content. However, they fell short in fully addressing clinical safety standards. Most LLMs demonstrated moderate to high reliability when responding to perinatal depression questions, supporting their potential as supplementary sources of health information. However, readability levels above recommended benchmarks suggest that current outputs may remain challenging for individuals with lower health literacy. While LLMs improve information accessibility, further improvements in readability, source attribution, and ethical transparency are needed to maximize public benefit and support equitable health communication. Future work should focus on defining and standardizing safety behaviors in high-risk mental health contexts to enable reliable clinical deployment.
To characterize the frequency, causes, and severity of opioid medication recalls using United States Food and Drug Administration (FDA) Enforcement Reports. We conducted a 2002-2025 descriptive analysis of recalls involving seven opioids. FDA free-text recall reasons were categorized into five standardized domains (wrong dose/potency, contamination, mispackaging/mislabeling, defective delivery system, and quality assurance deviations) by two independent reviewers, with agreement assessed using Cohen's κ (95% CI). Recall severity (Class I-III) was compared across drugs using the chi-square test, and temporal trends were evaluated with linear regression. We identified 286 opioid-related recalls, involving over 350 million units. Fentanyl (26.2% of events; > 30 million units), hydromorphone (20.3%; > 11 million), morphine (19.6%; > 73 million), oxycodone (12.6%; > 188 million), and hydrocodone (13.9%; > 50 million) accounted for most events. Recalls of buprenorphine (7.0%; > 3 million) and methadone (3.2%; > 400 thousand) were less frequent. Quality assurance deviations accounted for most recalls (49.5%), followed by mispackaging/mislabeling (14.4%), wrong dose/potency (13.7%), defective delivery systems (12.3%), and contamination (10.1%). Inter-rater agreement for categorization was high (κ = 0.88 [0.84-0.93]). Class I recalls (risk of death) comprised 35 events (12.2%), concentrated among fentanyl (n = 10), morphine (n = 9), and hydromorphone (n = 9) (χ2 = 43.1, df = 12, p < 0.001). Recall frequency increased significantly over time (r = 0.63, p = 0.001). Unit count data were missing for 34 events (11.9%), and production denominators were unavailable. Opioid recalls reflect manufacturing or quality assurance problems that may undermine product reliability. More complete recall reporting, including quantitative data, would support efforts to reduce risks associated with pharmaceutical quality failures. Opioid medications are widely used for pain treatment and for opioid use disorder, yet little is known about how often these products are recalled or why. We reviewed more than 20 years of recall data from the United States Food and Drug Administration and found 286 recalls involving seven commonly used opioids, affecting over 350 million tablets, capsules, patches, and injectable products. Nearly half of these recalls occurred because products failed basic quality checks, while others involved incorrect doses, contamination, mislabeling, or defective delivery systems. The most serious recalls, those carrying risk of serious injury or death, were concentrated among fentanyl, morphine, and hydromorphone. Many recall notices lacked important details, including the number of units affected, making it difficult to understand how these issues may impact patients. Our findings show that opioid recalls happen regularly and often reflect manufacturing problems that could influence treatment safety or effectiveness. Clearer and more complete reporting of recall information would help clinicians, patients, and regulators better understand these events and support safer use of opioid medications.
Exercise benefits people with epilepsy (PWE) by reducing seizure frequency, improving quality of life, and fostering social participation. YouTube, a popular platform for health information, hosts numerous exercise videos for PWE, yet their quality, reliability, and safety remain unevaluated. This study aimed to evaluate the quality, reliability, and engagement of YouTube videos on epilepsy-specific exercises using the novel Epilepsy Content Evaluation (ECE) tool, alongside DISCERN, JAMA, and Global Quality Score (GQS). A systematic search conducted on July 22, 2025, identified 45 English-language YouTube videos, which were independently evaluated by two neurologists using the ECE, DISCERN, JAMA, and GQS tools. Video characteristics, publisher sources, and engagement metrics were analyzed. Most videos (53.3%) were rated low quality (ECE score 0-7), with only one (2.2%) achieving high quality (ECE score 12-15). The Safety and Risk Management subdomain scored lowest (median: 2, IQR: 1.25), reflecting inadequate attention to seizure triggers and supervision. Athlete-targeted videos outperformed general-population videos in quality (p < 0.05). View counts negatively correlated with ECE scores (r = -0.375 to -0.712, p < 0.05). ECE demonstrated strong convergent validity with DISCERN (Phi = 0.606, p = 0.035). In conclusion, most YouTube videos on epilepsy-specific exercises lack clinical reliability and safety guidance, posing potential risks for PWE. The ECE tool effectively identifies these deficiencies, enabling clinicians to recommend ILAE-aligned videos to enhance patient safety. Curated, evidence-based digital resources could empower millions of PWE to engage in safe physical activity, reducing stigma and improving health outcomes.
Despite the transformative potential of large language models (LLMs) in health care, the rapid development of these tools has outpaced their rigorous evaluation. While artificial intelligence-specific reporting guidelines have been developed to address standardized reporting of artificial intelligence studies, there is currently no specific tool available for risk of bias assessment of LLM question-answer (QA) studies. Existing risk-of-bias tools for medical research are not well suited to the unique challenges of evaluating LLM-QA studies, which creates a critical gap in assessing their safety and effectiveness. This study aims to develop the Alberta Quality Assessment Tool: Risk of Bias (AQAT:RoB) for LLM-QA studies to systematically evaluate the validity and risk of bias in LLM-QA studies. We conducted 2 literature reviews. The first was on quality assessment tools for LLM-QA studies, and the second was on LLM-QA studies, which informed the first draft of the AQAT:RoB. The draft AQAT:ROB was further refined through a prespecified iterative process of modified Delphi, consensus meeting, and validation. The first Delphi process occurred between May 1 and May 20, 2025, and the first consensus meeting was held on May 22. The first round of validation was completed by 4 evaluators, who were not part of the consensus meeting, on 16 randomly selected studies. As this first round of validation surpassed our a priori threshold of ≥80% agreement and a Cohen κ of ≥0.61 between evaluators, no further rounds of development and validation were undertaken. A second Delphi process occurred between February 20 and February 23, 2026, to vote on postpilot changes in response to peer review. The AQAT:RoB consists of 5 high-level domains (Questions, Reference Answers, LLM Answers, Evaluators, Outcomes). These domains are subdivided into 9 subdomains. Each subdomain includes at least one "Support for Judgment" and at least one "Type of Bias" and is to be rated "low," "high," or "unclear" for risk of bias. A pilot evaluation was completed by internal validators who were not part of the consensus discussion and were asked to complete the AQAT:RoB form for each assigned study. Each of the 16 studies was evaluated by 2 evaluators independently. Pilot validation showed a percent agreement of 86.1% and a Cohen κ of 0.70 between assessors. The AQAT:RoB demonstrates promising initial reliability for assessing the validity or risk of bias in LLM-QA studies. The tool will benefit from future refinements, external validation, and periodic updates to keep pace with evolving technology.
Obstructive sleep apnea (OSA) is a prevalent disorder associated with significant cardiovascular morbidity, including hypertension. Continuous positive airway pressure (CPAP) is the primary treatment for OSA and has been proposed as an adjunctive therapy for hypertension. However, evidence regarding its antihypertensive effects remains heterogeneous, with many studies exhibiting low methodological quality. A comprehensive computerized search of PubMed, Embase, the Cochrane Library, Web of Science, China National Knowledge Infrastructure (CNKI), VIP, Wanfang, and the China Biology Medicine disc (CBM) databases was conducted from their inception to December 1, 2025, to systematically identify systematic reviews and meta-analyses examining the effect of continuous positive airway pressure (CPAP) on blood pressure in patients with obstructive sleep apnea (OSA). A citation overlap matrix was constructed, and the corrected covered area (CCA) was calculated to assess the degree of overlap among primary studies. The ROBIS, AMSTAR-2, PRISMA 2020, and GRADE tools were used to evaluate the risk of bias, methodological quality, reporting quality, and certainty of evidence of the included systematic reviews/meta-analyses, respectively. Quantitative and qualitative analyses were conducted on the primary outcomes to gain a more comprehensive and in-depth understanding. This umbrella review included a total of 17 systematic reviews/meta-analyses. The citation matrix analysis yielded a corrected coverage area of 14.2%, indicating a substantial degree of overlap among the primary studies, which may artificially inflate the perceived consistency of findings. Methodological quality assessment using AMSTAR-2 revealed a critical limitation that fundamentally shapes the interpretation of this overview: only 3 out of 17 included reviews were rated as high quality, while the remaining 14 (82.3%) were judged to be of low quality. This widespread methodological weakness-primarily driven by lack of pre-registered protocols, inadequate integration of risk of bias into conclusions, and poor reporting of funding sources-directly undermines the reliability of the conclusions drawn by these individual reviews and, by extension, the overall findings of this overview. The GRADE assessment of evidence certainty showed that among all evaluated outcomes, only 4 were rated as high quality, 29 as moderate, 51 as low, and 25 as very low. Current evidence suggests that continuous positive airway pressure (CPAP) therapy is associated with modest reductions in blood pressure among patients with obstructive sleep apnea, particularly in nocturnal measures. The therapy appears generally well-tolerated with no serious adverse events reported. However, these findings should be interpreted with caution due to the predominantly low-to-very-low certainty of the evidence, substantial methodological weaknesses in the included reviews, and significant overlap of primary studies. The magnitude of blood pressure reduction may be influenced by factors such as CPAP adherence and baseline hypertension severity, although these could not be robustly explored due to limitations in the included literature. Future research should employ large-scale, long-term real-world studies with standardized reporting of CPAP intervention details (e.g., device type, adherence, pressure settings) and diverse outcome measures such as ambulatory blood pressure monitoring. Such studies are needed to clarify the differential efficacy of CPAP across distinct clinical subgroups and its long-term cardiovascular benefits, thereby informing more precise clinical practice guidelines. https://www.crd.york.ac.uk/PROSPERO/; CRD420251239607.
Cardiovascular diseases (CVDs) remain the leading cause of mortality worldwide. Conventional risk prediction models often demonstrate suboptimal calibration and limited generalizability across populations. Artificial intelligence (AI) approaches, including machine learning (ML) and deep learning (DL), enable integration of multimodal clinical and imaging data for individualized cardiovascular risk estimation. To evaluate the applications, predictive performance, and translational limitations of AI models for cardiovascular risk prediction within an umbrella review framework. PubMed, Scopus, and Web of Science were systematically searched for studies published between January 2015 and October 2025 investigating AI-based prediction of cardiovascular outcomes. Eligible designs included randomized controlled trials (RCTs), cohort studies, systematic reviews, and meta-analyses. Predictive performance was the primary outcome, mainly assessed using the area under the receiver operating characteristic curve (AUC). Methodological quality was evaluated using established risk-of-bias tools. From 3500 identified records, 48 studies (8 RCTs, 28 cohort studies, and 12 systematic reviews or meta-analyses) were included in the final analysis. AI models achieved AUC values greater than 0.90 in more than 70% of imaging-based studies. Evidence synthesis showed predominant reliance on internal validation, inconsistent calibration reporting, and limited evaluation of algorithmic fairness. Multimodal data integration improved detection of coronary artery disease (CAD) and heart failure (HF). Wearable monitoring was associated with 18-25% lower hospitalization rates compared with usual care. AI improves predictive accuracy in cardiovascular risk assessment. Despite strong discrimination performance (AUC), methodological heterogeneity, insufficient calibration assessment, algorithmic bias, limited external validation, and regulatory uncertainty remain major barriers to implementation. Clinical translation requires multicenter RCTs, explainable AI frameworks, and standardized reporting guidelines such as Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis Artificial Intelligence (TRIPOD-AI). Cardiovascular diseases (CVDs) remain the leading cause of death worldwide, yet commonly used clinical risk prediction tools do not perform equally well across populations. This umbrella review shows that artificial intelligence (AI) has the potential to improve cardiovascular risk prediction. By analyzing nearly fifty high-quality studies published over the past decade, we found that AI-based prediction models often outperform traditional risk scores in estimating future cardiovascular events. This umbrella review integrated evidence from original research studies and previously published systematic reviews while minimizing duplication of data. In many investigations, particularly those using cardiovascular imaging, AI models demonstrated substantially higher predictive accuracy. Studies combining multiple data sources, including electronic health records, imaging data, genetic information, and wearable device monitoring, demonstrated improved diagnostic performance coronary artery disease (CAD) and heart failure (HF). Continuous monitoring using wearable technologies was associated with a reduction in hospitalization rates in prospective comparisons with usual care. Despite these promising findings, several challenges remain before AI can be routinely implemented in clinical practice. Variation in study design, potential algorithmic bias, and evolving regulatory requirements continue to limit widespread adoption. Overall, AI exhibits strong potential strong potential to support more personalized cardiovascular care; however, large prospective clinical trials and transparent reporting standards are necessary to confirm safety, fairness, and reliability before broad clinical integration.
To evaluate the efficacy and safety of 12 phosphorus-lowering drugs for hyperphosphatemia in chronic kidney disease 3-5 stages. Systematic review and network meta-analysis of randomized controlled trials (RCTs). We searched 3 databases from inception through September 2023 for RCTs evaluating 12 phosphorus-lowering drugs. We performed frequentist random-effects network meta-analyses and present mean differences and 95% CIs. Subgroup analyses were performed between the dialysis and nondialysis patients to assess robustness, source of heterogeneity, and risk of bias using the Cochrane risk of bias assessment tool. We included 121 trials (18,376 participants) and compared 13 drugs or placebo. In terms of efficacy, except for sodium ferrous citrate, all drugs lowered the level of serum phosphorus compared with placebo. Sucroferric oxyhydroxide (PA21), nicotinic acid, and tenapanor were most likely to be ranked the best, second best, or third best. Calcium/magnesium carbonate, nicotinic acid, and colestilan posed lower risks for hypercalcemia than calcium-based phosphorus binders. All phosphorus-lowering drugs significantly affect serum intact parathyroid hormone levels compared with placebo. Colestilan, tenapanor, and PA21 posed a higher risk for gastrointestinal discomfort. In addition, iron-containing drugs showed positive effects on iron parameters. Few high-quality RCTs; unclear allocation concealment and blinding; low evidence quality reduced reliability. PA21 has the best phosphorus-lowering effect in hyperphosphatemic adults with chronic kidney disease; considering efficacy and safety, calcium carbonate shows evidence of being the most appropriate drug with or without dialysis. Registered at PROSPERO (CRD42024500243). Hyperphosphatemia is a prevalent complication in patients with chronic kidney disease. Pruritus of the skin represents one of the primary clinical manifestations associated with hyperphosphatemia. Moreover, hyperphosphatemia exhibits close associations with cardiovascular disease, secondary hyperparathyroidism, soft tissue calcification, decreased bone density, accelerated kidney function deterioration, and increased mortality risk. Phosphorus-lowering drugs are commonly employed for treating this condition. In this study, we conducted a network meta-analysis using a random-effects model to examine the safety and efficacy of multiple phosphorus-lowering drugs in managing hyperphosphatemia by analyzing relevant randomized controlled clinical trials. Our findings indicate that gastrointestinal adverse reactions were the most frequently observed side effects of phosphorus-lowering drugs, and their incidence increased proportionally with effectiveness. Calcium-based phosphate binders demonstrated the greatest impact on serum calcium levels, as expected, whereas iron-based phosphate binders improved iron reserves and resin-based phosphate binders such as sevelamer and colestilan exhibited regulatory effects on blood lipids. Notably, sevelamer appeared to reduce all-cause mortality.
Healthcare-associated infections (HAIs), particularly postoperative infections, remain a major global concern, and deficiencies in surgical instrument packaging represent an important and preventable risk factor. The Central Sterile Supply Department (CSSD) plays a critical role in maintaining sterile assurance; however, process-related packaging non-conformance can compromise patient safety. This pilot study aimed to apply the Quality Control Circle (QCC) model to reduce non-conforming surgical instrument packaging and strengthen infection prevention capability in a tertiary hospital CSSD. This prospective quality improvement study was conducted in the CSSD of a tertiary hospital. Baseline performance was assessed using routine quality monitoring data, with non-conforming packaging defined as missing instruments, incorrect instrument type, nonfunctional instruments, improper sealing, wet packs, labeling errors, or chemical indicator defects. QCC methodology was applied, incorporating Pareto analysis, root cause analysis, structured training, workflow optimization, equipment maintenance reinforcement, and strengthened verification systems. The primary outcome was the non-conforming packaging rate before and after intervention. Baseline analysis identified missing instruments, incorrect instrument types, and incomplete or nonfunctional instruments as dominant contributors to packaging defects, with an overall baseline non-conformance rate of 0.213%. Following QCC implementation, the rate decreased to 0.199% during the initial assessment (May 2024; 19,060 packs inspected, 38 defective) and further declined to 0.106% during extended follow-up (June-October 2024; 82,182 packs inspected, 87 defective), achieving the predefined improvement target of a 50% reduction. Post-intervention Pareto analysis demonstrated a marked decrease in dominant defect categories, accompanied by strengthened process standardization and stability. Application of the QCC model significantly improved surgical instrument packaging quality, reduced non-conforming events, and enhanced operational reliability in the CSSD. These findings demonstrate that structured, team-based quality management is feasible, sustainable, and potentially beneficial for supporting infection risk reduction and perioperative safety. The model provides a replicable framework for broader implementation and future outcome-linked research.
Esophageal replacement (ER) remains a challenging but essential reconstructive procedure in pediatric surgery when the native esophagus cannot be preserved, most often in long-gap esophageal atresia (EA) and corrosive strictures. Data from India are limited, with most reports representing single-center experiences. This multicenter study by the Indian Association of Pediatric Surgeons (IAPS) aimed to analyze national patterns, outcomes, and long-term results of pediatric ER. A retrospective, multicenter observational study was conducted across eight IAPS-affiliated institutions. Children under 18 years who underwent ER were included. Data regarding demographics, indications, conduit type, surgical route, timing of cervical anastomosis, complications, and outcomes were analyzed. Statistical tests included Chi-square/Fisher's exact for group comparisons and penalized regression for exploratory assessment of risk factors for anastomotic complications. Ninety-nine children (69 males, 30 females; mean age 22 months, mean weight 9.9 kg) were analyzed. EA was the indication in 89.9% and corrosive stricture in 10.1%. Gastric conduits were used in 98.9% - isoperistaltic gastric tube (48.5%), gastric transposition (GT) (24.2%), and reverse gastric tube (17.2%) - with only one colonic interposition. The posterior mediastinal route was used in 53 (53.5%) and the retrosternal in 46 (46.5%) children. Overall mortality was 4.04%, all in GT, with a mortality of 16.7% in transmediastinal GT in patients with EA, while nil in transmediastinal GT for corrosive injury, which were significantly bigger children. Early complications occurred in 30.3% and late in 25.3%, with anastomotic leak (13.1%) and stricture (13.1%) being the most common. Leak and stricture rates were higher after staged than primary anastomosis (19% vs. 8.8% and 19% vs. 7%, respectively; P > 0.05). Penalized regression suggested that leak was more likely with higher weight, major comorbidities, and retrosternal route, while stricture risk was greater with comorbidities and retrosternal routing. Mean parental satisfaction and child quality-of-life scores were 4.25 ± 0.94 and 4.03 ± 0.94, respectively. Pediatric esophageal replacement in India is most commonly performed for long-gap esophageal atresia using gastric conduits and is associated with acceptable mortality and satisfactory long-term quality of life. Anastomotic complications remain common and appear influenced by patient comorbidities and conduit route, emphasizing the need for standardized approaches and multicenter collaboration to improve outcomes. Stomach-based conduits predominated in Indian practice due to their simplicity, vascular reliability, and adaptability to resource-limited settings. Retrosternal routing, though less anatomical, was often favored for its perceived safety and avoidance of mediastinal dissection. Leak and stricture rates were comparable to global data and appeared influenced more by comorbidities and conduit route than by staging or weight. The strong correlation between leak and later stricture underscores the importance of meticulous anastomotic technique and vigilant follow-up.
The quality of red blood cells (RBCs) is crucial in transfusion efficacy and safety, particularly in high-risk patients. In this study, age-related biochemical alterations in stored pRBCs were investigated using a systematic, paired comparison of samples collected from pilot tubes and the main storage bags. The analyses were based on spectroscopic measurements of the isolated supernatant mixture containing RBC-derived metabolites and hemolysis products, instead of intact red blood cells. This proof-of-concept demonstrates that pilot tube samples may not reliably reflect the biochemical state of pRBCs stored in the main bag. Our findings revealed that RBCs stored in pilot tubes undergo accelerated degradation, as indicated by elevated hemoglobin concentration, increased lactate levels, reduced glucose content, and a higher lipid-to-protein ratio. Semiquantitative analysis showed that these markers were elevated by approximately 20-100% by the seventh week of storage compared to those observed in the main blood bags. These consistent trends underscore that pilot tube samples do not reliably reflect the true biochemical quality of pRBCs intended for transfusion. Notably, the study highlights the high diagnostic potential of FTIR and Raman spectroscopies in assessing blood quality in a rapid, non-destructive manner. These techniques offer a promising tool for point-of-care evaluation of RBC integrity directly through the storage bag, enabling improved transfusion decision-making, especially in critical care settings. By directly comparing pilot tube and main bag samples, this study reveals systematic differences in their biochemical profiles and proposes a spectroscopic framework for representative, non-invasive evaluation of pRBC quality.
This study aimed to investigate the levels of alarm fatigue among ICU nurses in 20 hospitals in Hubei Province, China, using an ICU nurse-specific alarm fatigue measurement tool. Additionally, we reassessed the scale's reliability and validity among ICU physicians and explore personal and organizational factors associated with alarm fatigue in both groups. A descriptive cross-sectional study was conducted among ICU healthcare workers (nurses and physicians) from 20 hospitals in Hubei Province between April and May 2025. Data were collected via electronic questionnaires, including demographic information, the ICU Nurse Alarm Fatigue Questionnaire (ICU-NAFQ), the Chinese version of the NASA Task Load Index, and the Chinese version of the Maslach Burnout Inventory-General Survey. The ICU-NAFQ showed acceptable reliability among physicians (Cronbach's α = 0.714). Exploratory factor analysis confirmed structural stability. A total of 503 valid responses (382 nurses, 121 physicians) were analyzed using SPSS 26.0, with statistical methods including t-tests, ANOVA, correlation analyses, and multiple linear regression. The mean alarm fatigue score among ICU nurses was 23.30 ± 6.08 (moderate severity, approaching high severity), with 42.9% classified as high-severity. ICU physicians scored significantly higher (25.46 ± 5.68, p < 0.05), with 58.7% in the high-severity category. In multiple linear regression, among nurses, alarm fatigue scores were independently associated with educational level, the presence of dedicated noise quality management personnel, and the burnout dimensions of emotional exhaustion and reduced personal accomplishment (all p < 0.05). Among physicians, alarm fatigue was associated with gender, educational level and reduced personal accomplishment (all p < 0.05). Alarm fatigue is prevalent and severe among ICU physicians and nurses in China, with physicians at higher risk. Special attention should be directed toward ICU nurses with higher education levels, those working in units without dedicated personnel for noise quality management, and those scoring higher on the EE and RPA dimensions of burnout, as well as male physicians scoring higher on the RPA dimension of burnout. Our findings provide important guidance for future intervention. Healthcare institutions should establish routine monitoring systems for environmental factors and staff wellbeing to identify early signs of alarm fatigue and guide timely interventions. Standardized alarm management protocols and regular evidence-based training are essential to improve alarm response practices. Adoption of advanced alarm technologies may further reduce false alarms and enhance accuracy. Targeted support strategies should be implemented for high-risk healthcare professionals to mitigate burnout and optimize patient safety.
Medication errors, particularly involving continuous infusions in paediatric intensive care units (PICUs), pose significant risks to patient safety. This quality improvement initiative was prompted by a sentinel adverse event where a paediatric patient received a ketamine infusion at mg/kg/min instead of μg/kg/min, resulting in a dose over 1000 times higher than intended for over 16 h. The study aimed to evaluate a systems-based solution to reduce continuous infusion errors, focusing on improving patient safety through a multidisciplinary approach, enhancing staff compliance and assessing satisfaction and perceived safety improvements. A multidisciplinary team employed root cause analysis (RCA) and human factors engineering (HFE) principles to implement interventions, including procuring 150 smart infusion pumps with a customized drug library, aligning it with the EPIC electronic medical record (EMR) system, standardizing medication preparations and conducting extensive staff training. Implementation followed a Plan-Do-Check-Act (PDCA) cycle, starting with a pilot and scaling to the full 25-bed PICU. Following implementation, two minor incidents related to medication concentration changes occurred in October 2024. These were effectively addressed through reinforced education, the introduction of double-checking policies and hands-on simulation training. Since then, no further incidents have been reported from October 2024 to 1 January 2026. Outcomes were assessed using surveys conducted at 3- and 6-month post-implementation, along with rounds by the shift in-charge, biomedical engineering and nurse educator, and independent double-checking at initiation of a new infusion, syringe replacements and shift endorsements. The initiative achieved zero incidents from October 2024 to 1 January 2026, with compliance improving markedly. Surveys at 3 and 6 months showed over 95% staff satisfaction and perceived safety improvements. The project expanded to the emergency department and five subspecialty wards, demonstrating scalability. The intervention successfully minimized continuous infusion errors, fostering a high-reliability culture committed to zero harm. To sustain these gains, we recommend ongoing education, adherence to double-checking protocols and EMR enhancements. Future research should include multicentre evaluations and longer-term monitoring. This scalable model demonstrates how smart pumps, EMR integration and multidisciplinary training can significantly reduce continuous infusion errors in PICUs, enhancing safety for vulnerable paediatric patients and providing a framework for adoption in other high-risk settings like emergency departments and subspecialty wards.
To evaluate the feasibility, safety, and workflow of the development and prospective implementation of magnetic resonance-guided position-location tracking technology (MR-tracking [MRT]) for interstitial gynecologic brachytherapy. Between 2018 and 2023, patients undergoing template-based (Syed/Neblett) interstitial brachytherapy were included in the analysis. The 2 cohorts for analysis were (1) prospectively enrolled patients who received MRT for real-time visualization, and (2) standard procedure implants performed without MRT technology. The 2 cohorts were compared for overall procedural time with an ad hoc inferiority analysis, and implant quality was assessed by the resultant clinically delivered dosimetry with a Student's t test. Eighteen (18) patients who consented to treatment with MRT visualization were compared to a cohort of 62 patients undergoing standard interstitial brachytherapy. The MRT system was successfully used without interruption from device-related events in all 18 patients. The mean (standard deviation) procedure time was 96 (±27) min and 100 (±26) min for the MRT and standard implants, supporting the noninferiority conclusion of the MRT implants (p = 0.03). For the patient subgroup including primary vaginal or recurrent endometrial without a uterus, we reported significantly improved clinical target volume (CTV) coverage (p = 0.02) for the MRT cohort, (CTV D90 = 75.2 Gy) compared to the standard procedure cohort (CTV D90 = 66.4 Gy), suggesting the physician was able to better target the tumor volume during needle insertion. Differences in organ-at-risk doses were not statistically significant. The prospective study establishes the feasibility of MRT integration with the successful demonstration of device reliability in a complex clinical setting. The integration of MRT as an investigational technology did not negatively impact overall procedure time and may have had a positive effect on resultant dosimetry for a subgroup of patients. Future clinical investigations should include prospective use of the technology in a matched-cohort trial.
Reliable prediction and interpretation of water quality dynamics are essential for environmental monitoring and risk-informed water resources management. Explainable machine learning (XML) offers means to interpret complex predictive models; however, commonly used explanation methods often yield inconsistent feature attributions, and feature selection frequently relies on subjective correlation thresholds. This study develops a unified consistency index (CI) and a data-driven cross-validated recursive feature elimination (RFECV) workflow to quantitatively assess and improve the reliability of XML-based interpretation. Using a 30-year river-water dataset and nine machine-learning algorithms, two XML frameworks were evaluated (correlation-based XML and RFECV-based XML). Correlation-based models achieved strong predictive performance (RMSE = 0.77, 0.64, 1.07), whereas RFECV reduced input dimensionality by 69-85% (from 6 to 12 features) while maintaining comparable accuracy (RMSE = 0.75, 0.57, 1.20). Across the correlation-based workflow, CI values ranged from 0.42 to 0.72, 0.27 to 0.48, and ∼0.60, indicating strong rank-level agreement among interpretation tools. RFECV-based XML preserved predictive accuracy but produced lower CI values (0.00-0.65), reflecting tighter top-k agreement but weaker global ranking coherence. This pattern represents a practically relevant form of reliability in which agreement on core drivers is maintained despite reduced ranking stability. High and α-stable CI values indicate that interpreter disagreements are benign, whereas low and α-sensitive CI values reveal instability in explanations. By providing a quantitative diagnostic check on the robustness of explanations, this study helps ensure that XML-based water-quality assessments are clearer, more trustworthy, and more practical for real-world decision-making.
The volume of referrals concerning child abuse and neglect underscores the importance of systematic risk assessment to identify and analyse factors that increase its likelihood, enabling timely intervention (Brem & Forrester, 2023). This systematic review examines actuarial risk assessment instruments employed in child protection systems and identifies the specific risk factors they evaluate. A comprehensive search was conducted across PubMed, EBSCO, and Web of Science databases, supplemented by manual searches, following PRISMA guidelines. Methodological quality was assessed using the CCAT, and risk of bias was evaluated using the Joanna Briggs Institute (JBI) Critical Appraisal Tools. After screening 1585 records, 11 empirical studies published between 2009 and 2023 were included, covering 10 distinct risk assessment instruments applied in child welfare contexts in the United States, Spain, the Netherlands, and the United Kingdom. Findings highlight five key domains of risk factors commonly assessed: demographic and socioeconomic conditions, child-specific vulnerabilities, environmental influences, caregiver and family dynamics, and prior abuse or neglect. Actuarial models demonstrated higher predictive validity and inter-rater reliability than consensus-based assessments, although their performance varied across cultural and socioeconomic contexts. Most instruments conceptualised maltreatment risk as a cumulative construct and did not consistently differentiate predictive performance by subtype (e.g., neglect, physical abuse, exposure to domestic violence). In addition, some socio-demographic predictors may reflect likelihood of child protection system involvement rather than confirmed maltreatment, requiring cautious interpretation. Overall, effective actuarial instruments integrate multiple domains, emphasise modifiable risk factors, and complement clinical judgment. This review underscores the need for culturally responsive, empirically validated, and digitally accessible tools to support evidence-based decision-making and enhance child protection practices worldwide. Strengthening longitudinal validation and subtype-sensitive modelling may further improve precision and equity in risk assessment.