To evaluate the association between the newly proposed preclinical and clinical obesity framework and the long-term risk of major adverse liver outcomes (MALO). This retrospective cohort study included 7,462,871 individuals who participated in the 2009-2010 National Health Screening Program. Individuals were divided into three groups: without excess adiposity, preclinical obesity, and clinical obesity. Excess adiposity was determined by body mass index, waist circumference, and waist-to-height ratio. Participants were followed up through December 31, 2023. MALO was defined as hepatocellular carcinoma, liver cirrhosis-related events, and liver-related death, including liver transplantation. Cox proportional hazards regression analysis was conducted to estimate hazard ratios (HRs) and 95% confidence intervals (CI). During a median follow-up of 14.0 years, 105,812 incident cases of MALO were identified. Preclinical obesity and clinical obesity were associated with significantly higher risks of MALO compared to individuals without excess adiposity. The adjusted HRs (95% CI) were 1.08 (1.06-1.10) for preclinical obesity and 1.09 (1.07-1.10) for clinical obesity. When clinical obesity was further stratified, the adjusted HR (95% CI) was 17.64 (16.61-18.73) for clinical obesity with underlying hepatic dysfunction and 1.04 (1.03-1.06) for those without. The clinical obesity framework offers a clinical approach in predicting long-term liver outcomes. Integrating this framework into clinical and public health practice could reduce the burden of obesity-related liver disease.
Chronic lymphocytic leukemia (CLL) is the most frequent leukemia in adults in the United States, with roughly 24,000 new cases expected in 2026 and more than 220,000 people currently living with the disease. During the past 2 decades, advancements in biologic insights and treatment options have significantly enhanced the outcomes for patients with CLL. Treatment paradigms have shifted decisively from chemoimmunotherapy to targeted agents. Bruton tyrosine kinase inhibitors, B-cell lymphoma 2 inhibitors, and anti-CD20 monoclonal antibodies now form the backbone of frontline care, with therapy tailored by genetic risk, comorbidities, and patient preferences. For relapsed or refractory disease, sequencing of covalent and noncovalent Bruton tyrosine kinase inhibitors, B-cell lymphoma 2 inhibitors, and cellular therapies, including chimeric antigen receptor T-cell therapy, has expanded options and improved outcomes in high-risk settings. Richter transformation of CLL represents an area of unmet need as most contemporary series report survival of less than 24 months. These guidelines summarize the Mayo Clinic approach to the diagnosis, risk stratification, and management of patients with CLL, including those with Richter transformation of CLL.
Art is integrated into the Mayo Clinic environment. Since the original Mayo Clinic Building was finished in 1914, many pieces have been donated or commissioned for patients and staff to enjoy. Each issue of Mayo Clinic Proceedings features a work of art (as interpreted by the author) that is displayed in a building or on the grounds of Mayo Clinic campuses.
To show that heart failure with supranormal ejection fraction (HFsnEF, left ventricular ejection fraction [LVEF] ≥ 65%) is not a marker of cardiac health but a specific enrichment zone for malignant cardiomyopathies and to provide a phenotypic framework for differential diagnosis. We retrospectively analyzed 200 consecutive patients with HFsnEF admitted between December 1, 2017 and December 1, 2024. Patients were stratified into 2 phenotypes based on etiology: group A (hemodynamic loading, n=88), comprising valvular or hypertensive heart disease; and group B (intrinsic cardiomyopathy, n=112), comprising hypertrophic cardiomyopathy, cardiac amyloidosis, or Fabry disease. We compared their electromechanical profiles and event-free survival. Despite identical LVEF (median, 69%), the 2 groups exhibited distinct clinical signatures. Group A represented a classic secondary remodeling profile (older age and atrial fibrillation). In contrast, group B patients were younger (56.0 vs 68.8 years; P<.001) yet displayed severe diastolic stiffness (septal E/e' > 13) and a distinct electromechanical fingerprint (wide QRS or PR deviations). In a fully adjusted multivariable Cox model, the intrinsic cardiomyopathy phenotype was the dominant driver of the composite end point (adjusted hazards ratio, 3.91; 95% CI, 1.71-8.93; P = .001). Notably, N-terminal pro-B-type natriuretic peptide was not an independent predictor (P = .20) in this population, indicating that structural etiology overrides hemodynamic biomarkers in determining prognosis. An LVEF of 65% or more is not reassuring but signals underlying heterogeneity, often masking malignant intrinsic cardiomyopathies that masquerade as benign heart failure with preserved ejection fraction. A multimodal red flag approach-integrating clinical, imaging, and electrocardiographic data-is essential to trigger targeted evaluation, as standard heart failure with preserved ejection fraction management is insufficient for these high-risk phenotypes.
Heart failure with preserved ejection fraction represents an increasingly common cause of exertional intolerance in ambulatory practice. Because symptoms are frequently nonspecific and traditional signs of congestion may be absent at rest, diagnosis requires a structured approach that integrates clinical assessment, natriuretic peptide interpretation, and targeted evaluation of filling pressures. Management has evolved toward a multimodal strategy that can be initiated in primary care, combining foundational pharmacologic therapies found to reduce symptoms and hospitalizations with aggressive optimization of cardiometabolic comorbidities. This review translates contemporary guideline recommendations into a practical, outpatient-focused framework to support timely diagnosis and evidence-based management of heart failure with preserved ejection fraction.
To synthesize the performance of artificial intelligence (AI) applications for detecting pelvic fractures, classifying severity, and predicting clinical outcomes relative to clinicians. The study was designed as a systematic review and meta-analysis (PROSPERO CRD420251141768). Ovid Embase, Ovid MEDLINE, PubMed, Scopus, and Cochrane CENTRAL were searched for articles published from database inception to September 11, 2025. Studies were included if they evaluated AI models for pelvic ring fractures in adults using pelvic radiographs. Case series, reviews, and abstracts without full data were excluded. Summary level data were independently extracted using a standardized template. Diagnostic metrics were pooled using a random-effects model. Outcomes included pooled sensitivity, specificity, area under the receiver operating characteristic curve (AUC), and accuracy. Fourteen studies were included. Thirteen studies evaluated radiographic fracture detection or classification (n=31,166 radiographs) and one study evaluated outcome prediction. AI demonstrated high pooled performance: accuracy 0.96 (95% CI, 0.91-0.98; I 2 =93.3%), AUC 0.94 (95% CI, 0.89-0.97; I 2 =97.9%), sensitivity 0.90 (95% CI, 0.84-0.94; τ 2=0.42), and specificity 0.93 (95% CI, 0.85-0.97; τ 2=1.30). In 3 studies directly comparing AI with clinicians, AI models showed comparable or marginally superior performance. One study on clinical outcomes reported strong performance for predicting hemodynamic instability (AUC 0.92) and mortality (AUC 0.90). AI algorithms show promise as supportive tools for pelvic fracture detection, achieving diagnostic performance comparable to expert clinicians. However, included studies exhibit substantial heterogeneity, selection bias, and limited external validation. Large-scale, prospective validation is necessary before widespread clinical adoption.
Rheumatoid arthritis (RA) is increasingly recognized as a systemic disease that evolves from genetic susceptibility and mucosal immune dysregulation to systemic autoimmunity, arthralgia, subclinical inflammation and ultimately, in a subset of patients, to clinically apparent arthritis. This review aims to provide primary care clinicians with a practical framework for recognizing suspected early RA, facilitating timely rheumatology referral, and supporting comorbidity co-management. It also highlights current treatment concepts and emerging evidence on precision diagnosis and therapy. The disease-continuum model has created opportunities for earlier identification of at-risk individuals and stage-adapted interventions, although preventive strategies remain investigational. Early diagnosis remains challenging, especially in seronegative RA. Prompt disease-modifying antirheumatic drug initiation, within a treat-to-target framework, has improved outcomes. However, many patients continue to experience flares, and remission is not universal. Precision treatment selection is evolving, but not yet routine. Importantly, RA extends beyond joint inflammation: cardiovascular disease and RA-associated interstitial lung disease are major drivers of morbidity and mortality. Shared primary care-rheumatology management should address cardiovascular risk, RA-ILD risk, vaccination, bone protection, perioperative planning and treatment safety. Overall, RA management is shifting from reactive treatment of clinically apparent arthritis toward earlier recognition, tighter disease control and personalized comprehensive care.
To develop an efficient and domain-adapted system to process colonoscopy and pathology reports using knowledge distillation techniques. We implemented a knowledge distillation framework to create a smaller, domain-specific large language models-based natural language processing model for summarizing and extracting key information from clinical reports. The model was trained on a dataset consisting of 5500 colonoscopy reports and 7000 pathology reports taken from January 1, 2024, to June 30, 2024. Performance was evaluated against ground truth polyp categories derived from pathology report diagnoses. The distilled model reported high domain-specific performance, achieving 95.2% accuracy (95% CI, 93.9%-96.5%), 0.95 precision, and a 31.5% improvement in inference speed relative to the teacher model. Despite being substantially smaller than the teacher model, it maintained strong capability in polyp category identification from key clinical factors, including polyp number, size, histology, and location across colonoscopy and pathology reports. Domain clinicians reported high agreement with model outputs across all 6 evaluated clinical questions, confirming its reliability for supporting follow-up recommendation workflows. This work presents a step toward making domain-specific natural language processing models for gastroenterology more efficient and scalable. By leveraging knowledge distillation, we report the potential for creating more practical domain-specific models that can assist in interpreting complex clinical documentation. Future work will focus on real-world validation and expanding the model to other procedural report types.
Artificial intelligence (AI) and automation are rapidly transforming health care, yet their integration into clinical workflows often falls short owing to technical, ethical, and organizational challenges. Lack of trust emerges as the central hurdle, encompassing both patient and provider confidence in AI systems. Patients raise concerns over safety, transparency, and the physician-patient relationship, whereas providers express apprehension toward algorithmic opacity, data quality and, legal ambiguity. To address these concerns, the understand, transform, and sustain (UTS) framework offers a behavior-based, systems-level approach to AI deployment. Developed by Mayo Clinic's Quality Academy, UTS integrates process improvement principles across 3 phases, emphasizing stakeholder engagement, transparency and patient safety throughout the AI lifecycle. The understand phase identifies inefficiencies by mapping workflows, collecting data, and recognizing areas for improvement, ensuring developers create tools that address appropriate priorities. In the transform phase, interventions are designed, implemented and tested through improvement cycles and feedback loops. Data to build algorithms is carefully evaluated to avoid biases, and AI output is assessed for opacity risk to maintain transparency and explainability. The sustain phase monitors outcomes and standardizes practices for long-term value. Data audits and automated extraction tools are applied for fidelity and harmonization, promoting scalability and collaboration among organizations. By keeping human intelligence central, UTS represents a catalyst for responsible innovation by aligning technological advancement with clinical priorities. Previous frameworks are more prescriptive in terms of tools and actions; UTS builds on this, targeting the underlying decision-making teams needed for sustainable process improvement, critical for successful health care transformation.
To develop and pilot test 2 context-specific digital tools-Standardized Community Overview for Planning and Evaluation (SCOPE) and Facility Review for Assessing Medical Environments (FRAME)-designed to support virtual care service delivery and health care planning in rural and remote communities. This study used a participatory case study design informed by user-centered development principles to design and pilot the SCOPE and FRAME tools. Development involved collaborative workshops with clinicians, health care administrators, community partners, and a software development team. Publicly available data sources were combined with community-verified information to populate community-specific profiles. The tools were piloted in 3 communities in Saskatchewan, Canada-Stanley Mission, La Loche, and Whitecap Dakota Nation-from October 1, 2024, to March 31, 2025. During pilot implementation sessions, participants interacted with the prototype and qualitative feedback regarding functionality, usability, and practical utility was collected. Participants reported that the SCOPE and FRAME tools supported health care planning and service coordination by providing centralized access to community-specific demographic, infrastructure, and logistical information. Users indicated that dynamic site management features enabled real-time updates to community profiles, allowing the tools to reflect evolving health care service availability. Data visualization dashboards, including interactive graphs and maps, were reported to support interpretation of health care trends and identification of service gaps. Clinicians also noted that the tools provided useful contextual information to support onboarding of virtual care clinicians and improve understanding of local care pathways. Findings from the pilot implementation suggest that context-specific digital tools such as SCOPE and FRAME may support improved coordination, planning, and contextual awareness in virtual care delivery for rural and remote communities.
To investigate associations and characteristics of patients with infective endocarditis (IE) admitted to the cardiac intensive care unit. Adult patients admitted to the Mayo Clinic cardiac intensive care unit from January 1, 2007, through April 30, 2018, with confirmed acute IE were included. We conducted a retrospective cohort study of data on demographic characteristics, clinical factors, laboratory findings, and outcomes. Patients were categorized by cardiac surgery status: performed, indicated but declined, or not indicated. Primary outcomes were 30-day and 1-year all-cause mortality, analyzed using the Kaplan-Meier method and Cox proportional hazards regression model and adjusted for predictors. A total of 233 patients were included. Native valve IE occurred in 104 patients and prosthetic/device-associated IE in 129 patients. Staphylococcus aureus was the most common organism (99 [42.7%]) and was present in most 30-day deaths (42 [72.4%]). Surgical treatment was indicated in 182 patients (78.1%): 129 underwent a surgical procedure, 53 declined, and 51 had no indication for surgical intervention. The 30-day mortality was 24.9% (58 patients), with older age, higher illness severity and comorbidities, and critical care needs as predictors. The 30-day mortality was higher for those declining surgical treatment (60.4%; adjusted hazard ratio [HR], 2.32; P<.021) and lower for those who underwent surgical intervention vs patients with no indication for such treatment (11.9% vs 23.8%; adjusted HR, 0.40; P=.026). The 1-year mortality was 39.9% (93 patients), with higher mortality in patients who declined surgical intervention (85.1%; adjusted HR, 3.94; P<.001) but similar mortality for those who underwent a surgical procedure vs those with no indication for surgical treatment (31.8% vs 29.3%; adjusted HR, 0.92; P=.80). Infective endocarditis in cardiac intensive care unit patients is associated with high mortality. Severity scores, comorbidities, and critical care needs were mortality predictors. Early surgical treatment improved short-term outcomes, but long-term mortality remained high.
To evaluate whether large language models (LLMs) exhibit implicit gender, racial, and ethnic biases when used to provide psychiatric diagnoses, and to understand how such biases may impact the accuracy and fairness of AI-assisted clinical decision support. We conducted a large-scale audit of 6 LLMs, including general-purpose and medical-specific models, using 97 Diagnostic and Statistical Manual of Mental Disorders 5 psychiatric training cases, conducted between October 1, 2023, and June 23, 2025. Cases were systematically altered to suggest different gender and racial/ethnic identities-across 39 demographic groups-by changing names, pronouns, and descriptors while keeping clinical symptoms constant. We assessed diagnostic accuracy, additional or missed diagnoses, and language of diagnostic reasoning. Of the LLMs tested, Generative Pretrained Transformer 4o emerged as the most accurate model and was selected for deeper analysis. Although Generative Pretrained Transformer 4o accurately identified at least 1 correct diagnosis in 82.8% of cases (15,346 of 18,527 cases), it often added non-ground truth diagnoses (70.3% of cases, or 13,017 of 18,527 cases), suggesting a tendency to overdiagnose. Accuracy varied by gender, with higher performance for female patients and lower for nonbinary individuals. Although overall accuracy did not differ significantly by race/ethnicity, biased diagnostic patterns emerged. For example, cultural bereavement and antisocial behavior-were diagnosed exclusively in patients of color, and terms such as disruptive were used more frequently for Black men. Our findings found that LLMs reproduce and reinforce clinical biases even when symptoms are constant. Moreover, AI-based tools must be audited not only for accuracy but also for bias in both diagnoses and explanatory language, especially when used in high-stakes mental health contexts.
Lumbar puncture (LP) is a crucial procedure in the diagnosis and management of multiple neurologic diseases, and the procedure can be safely performed in most patients. However, dangerous complications may rarely occur, including herniation and death. There remains a paucity of high-quality data identifying patients at increased risk of life-threatening complications from LP, even in the post-neuroimaging era. Therefore, we performed a comprehensive review of the literature to summarize the known physiologic changes and associated complications following LP, especially in the presence of central nervous system mass lesions or altered CSF dynamics. Based on this, we propose an imaging-based primer to guide clinicians in making decisions about candidacy for LP. Imaging findings that contraindicate LP include the presence of supratentorial mass lesion(s) with pre-existing herniation or significant midline shift, medium to large posterior fossa mass(es), posterior fossa malformation, globally elevated intracranial pressure with mass effect or generalized edema, severe brain sag, and obstructive hydrocephalus. We integrate these imaging features with prior LP clinical guidelines for a concise flowchart to aid in identification of LP contraindications.
Calcific aortic stenosis primarily affects the aging population and is associated with significant morbidity and mortality. The progression of aortic stenosis from sclerotic aortic valve to severe aortic stenosis is an active metabolic process that involves multiple signaling pathways. Multimodality imaging can assess aortic valve calcification and metabolic activity and quantify severity of disease. This review describes the clinical utility of echocardiography, cardiac magnetic resonance imaging, cardiac computed tomography, and nuclear imaging in the assessment of aortic stenosis. These imaging checkpoints may also serve as targets of clinical research trials aimed at halting the progression of aortic stenosis.
To investigate alcohol withdrawal syndrome prevalence among women and to study age and sex differences in clinical manifestations and hospital course. This cohort study included all hospitalized patients from June 1, 2019 to June 1, 2022, where the Clinical Institute Withdrawal Assessment for Alcohol Scale, revised, protocol for alcohol withdrawal syndrome was implemented. A total of 16,190 hospitalizations (10,092 patients aged ≥21 years), with 30.2% women, were included in the study and divided into the following 4 age groups: 21 to 39 years (n=2453 [24.3%]), 40-64 years (n=5128 [50.8%]), 65-74 years (n=1705 [16.9%]), and 75 years or older (n=806 [8%]). We considered the age group (40-64 years) with the highest number of patients as the reference group. Older patients presented with lower blood alcohol concentration and took longer to reach peak withdrawal manifestations than younger patients. Compared with women, men in the youngest age group reached a peak withdrawal earlier (mean, 18.5 hours [95% CI, 17.2-19.8] vs 19.4 hours [95% CI, 17.5-21.3]; P<.001), and there was no sex difference in other age groups. Men required higher benzodiazepine doses during hospitalization in the youngest (mean, 20.1 mg [95% CI, 17.1-23.1] vs 13.6 mg [95% CI, 11.4-15.8]; P=.013) and the reference (mean, 18.0 mg [95% CI, 16.5-19.4] vs 12.7 mg [95% CI, 11.4-14]; P<.001) age groups; there was no sex difference among older adults. The benzodiazepine dose during the first 24 hours of hospitalization dose decreased significantly as age increased (21-39 years: mean, 6.64 mg [95% CI, 6.3-7.0]; 40-64 years: mean, 5.8 mg [95% CI, 5.5-6.0]; 65-74 years: mean, 3.76 mg [95% CI, 3.4-4.1]; ≥75 years: mean, 2.79 mg [95% CI, 2.4-3.2]; P<.001 for all groups). There was no sex difference in the all-cause mortality rate, including posthospitalization mortality, among any age group. The results of this study show a narrowing of the traditional sex gap in alcohol withdrawal.
Body mass index (BMI) is widely used to diagnose obesity and assess chronic disease risk, yet it does not reflect true body composition (BC), particularly fat distribution and lean mass. While widely used and valuable for population-level screening, BMI is limited in its ability to capture individual-level variations in BC and disease risk. We aimed to critically evaluate the limitations of BMI and highlight the clinical value of incorporating BC metrics in obesity diagnosis and risk stratification. This narrative review with a structured literature search synthesized evidence from observational studies, clinical cohorts, and imaging-based analyses published between January 1, 2000 and December 31, 2025. Searches were conducted using PubMed, Scopus, Web of Science, and Google Scholar using predefined keywords related to BC, adiposity, and chronic disease risk. Although a structured search strategy was employed, this review did not follow formal systematic review methodology (eg, PRISMA) and instead emphasizes qualitative synthesis of evidence to compare BMI with direct measures across chronic diseases. Individuals with normal BMI but high visceral fat are at increased risk for conditions such as type 2 diabetes, cardiovascular disease, and cancer. Conversely, high-BMI individuals with healthy BC profiles may have better outcomes. Phenotypes such as sarcopenic obesity and normal-weight obesity are poorly identified using BMI alone. Imaging tools like dual-energy X-ray absorptiometry and bioelectrical impedance analysis offer more precise risk assessment. We conclude a shift from BMI-centric approaches to personalized BC-based evaluation is essential for improving obesity diagnosis, guiding interventions, and reducing chronic disease burden.
To investigate the independent associations of domestic water hardness and its primary mineral components, calcium (Ca) and magnesium (Mg), with the incidence of chronic kidney disease (CKD). This prospective cohort study included 414,587 participants from the UK Biobank. Exposures were domestic water hardness, categorized as total hardness (CaCO3), Ca, and Mg concentrations. The primary outcome was incident CKD. Multivariable Cox proportional hazards models were used to estimate HRs and 95% CIs, adjusting for a comprehensive range of demographic, lifestyle, and clinical confounders. Over a median follow-up of 13.2 years, 12,690 incident CKD cases were identified. Restricted cubic spline analyses revealed distinct dose-response patterns. Mg demonstrated the most substantial clinical relevance, displaying a clear linear dose-response relationship (HR, 1.46 for highest vs lowest quartile). In contrast, CaCO3 and Ca showed statistically significant nonlinear associations, with risk increasing primarily at intermediate concentrations (inverted U-shape) rather than showing a consistent linear trend. The findings remained robust across all subgroup and sensitivity analyses. Our study identifies an association between water hardness (Mg and Ca) and CKD incidence. These findings highlight water mineral composition as a potential environmental factor in renal health, although further mechanistic and geographically diverse studies are required to validate these relationships before guiding public health interventions.
To assess the clinical characteristics of small abdominal aortic aneurysms (AAAs) for growth patterns, growth rates, time to repair, and adverse outcomes relative to surveillance imaging intervals. Patients with small AAAs (30-45 mm in diameter) and at least 1 follow-up imaging study were eligible for inclusion (January 1, 2014, through December 31, 2019). Variables impacting time to repair, rupture, or death were assessed. Of 2044 unique patients with AAAs, a random sample of 299 patients (mean ± SD age, 70.5 ± 9.0 years; 18.7% female) were divided into slow-/no- (≤2 mm per year; n=116), intermediate- (>2 to <5 mm per year; n=150), and rapid- (≥5 mm per year; n=33) growth categories. During 10.0-year median follow-up, 61.2% of the patients (n=183) underwent repair. Rapid-growth AAAs were more likely to undergo repair (88% vs intermediate 81.7% or slow/no growth 26.5%; P<.001), required earlier repair (P<.001), and were more likely to die during follow-up (79% vs 41% or 56%; P=.045). Predictors of time to repair included index AAA size, diabetes mellitus, and β-blocker or diuretic therapy. Ruptures (n=1) or impending ruptures (n=4) during follow-up were greatest in the rapid- (n=2, 6.1%; P<.001) compared with the intermediate- (n=3, 2.0%) or slow-/no- (0%) growth groups. Growth rate characteristics of AAAs influence repair-free survival, time to repair, overall survival, and rupture-free survival.
To develop a machine learning algorithm to predict emergency department (ED) waiting room surge status (green, yellow, and red categories) 4, 8, and 12 hours in advance. We designed a retrospective cohort with prospective validation conducted between July 1, 2023, and April 30, 2025. A total of 154,956 encounters were analyzed; features included ED operational metrics and timestamps with 72-hour lagging data. Deep neural network and gradient boosted decision tree (XGBoost) models were trained to predict 3 predefined surge categories: green (<15 waiting patients; 48.8% of hours), yellow (15-30 waiting patients; 38.8% of the hours), and red (≥31 waiting patients; 12.3% of the hours). The XGBoost model found strong predictive performance across all forecast horizons. Area under the curve (AUC) curves showed excellent discrimination between green and yellow levels, ranging from 0.87 to 0.91 AUC across all time horizons, demonstrating reliable differentiation between normal and moderate surge conditions. The model showed acceptable discrimination for red levels, with AUCs of 0.76 and 0.77, meeting commonly accepted thresholds for clinical forecasting tools, particularly given the difficulty of predicting rare, high-volume situations. The operational accuracy of 68% to 70% showed strong real-world multiclass operational forecasting over prolonged windows up to 12 hours in advance. Using operational metrics with timestamps, XGBoost models can differentiate between different levels of ED surge states with meaningful accuracy. This ability to forecast risk of high volumes provides a window of opportunity to proactively modify operations.