Emerging evidence has positioned neuroinflammation as a central mechanism in the pathogenesis of epilepsy and in the development of drug resistance. Cytokines such as IL-1β, IL-6, IL-8, IL-17, and IL-18 have been implicated in epileptogenesis, yet their clinical significance in distinguishing drug-resistant epilepsy (DRE) from drug-responsive epilepsy (DREsp) remains underexplored. This study aimed to evaluate the serum levels of IL-1β, IL-6, IL-8, IL-17, and IL-18 in patients with epilepsy and to investigate their association with treatment responsiveness and seizure frequency. A total of 90 participants were enrolled, including 30 patients with DRE, 30 with DREsp, and 30 age- and sex-matched healthy controls. Serum cytokine levels were quantified using enzyme-linked immunosorbent assay (ELISA). Statistical comparisons, receiver operating characteristic (ROC) analyses, and Jonckheere-Terpstra trend tests were performed to assess intergroup differences and biomarker performance. All five cytokines were significantly elevated in epilepsy patients compared to controls (p < 0.001), and further increased levels were observed in the DRE group relative to the DREsp group. IL-6 and IL-8 showed perfect diagnostic performance in distinguishing epilepsy from healthy individuals (AUC = 1.00), while IL-6 remained highly accurate in differentiating DRE from DREsp (AUC = 0.99). IL-1β demonstrated the strongest correlation with seizure frequency and showed a progressive increase across control, DREsp, and DRE groups. Trend analysis confirmed a significant stepwise elevation of cytokine levels with increasing seizure frequency for all markers. Elevated serum levels of IL-1β, IL-6, IL-8, IL-17, and IL-18 in epilepsy-particularly in drug-resistant cases-support the role of neuroinflammation in epileptogenesis and treatment refractoriness. These cytokines may serve as valuable biomarkers for disease severity and potential targets for novel immunomodulatory therapies, especially in patients with DRE who are candidates for non-pharmacological interventions.
Global food supply chains continue to expand in scale and complexity, heightening risks of fraud, mislabeling, and cross-contamination across certification systems. These vulnerabilities are particularly consequential for Halal certification, where ingredient permissibility, compliant slaughter practices, hygienic processing, and full traceability are religiously mandated. Artificial intelligence (AI), including machine learning, natural language processing, computer vision (CV), predictive modeling, and sensor-enabled monitoring, offers transformative potential for strengthening the precision, transparency, and efficiency of Halal assurance. Current certification workflows often rely on manual document review, heterogeneous national standards, limited real-time oversight, and susceptibility to fraudulent or unregulated Halal claims. AI-enabled tools address these limitations by facilitating automated ingredient authentication through DNA-based analysis, alcohol threshold detection, enzyme origin classification, and advanced label interpretation. In slaughter and processing environments, CV systems and IoT sensors enable continuous monitoring of animal handling, cut accuracy, bleeding efficiency, and postslaughter segregation, generating objective evidence of compliance. Pilot implementations indicate that integrating AI with blockchain-based traceability and automated data pipelines can reduce certification timelines from weeks to days, improve anomaly detection to near-perfect accuracy, and support dynamic, continuously verifiable compliance rather than periodic audits. Within this context, AI operates effectively under a hybrid governance model in which religious authorities retain interpretive and theological decision-making while AI provides high-resolution, tamper-resistant documentation. Future advancements will depend on harmonized data standards, religious principle-aligned AI governance, strong ethical and cybersecurity safeguards, and wider regulatory acceptance across Islamic jurisdictions. Together, these efforts will enable next-generation Halal certification while preserving religious integrity and sustaining consumer trust.
Introduction The literature highlights the considerable potential of Artificial Intelligence (AI), particularly large language models (LLMs), in advancing health promotion among individuals. This study aimed to evaluate the performance of several LLMs in responding to questions from the Principles of Health course. This cross-sectional study was conducted in 2025. The LLMs evaluated included ChatGPT-4o, Gemini 2.5, Copilot 2025, and Perplexity 2.250619.0. These LLMs were utilized to respond to the study questionnaire pertaining to the Principles of Health course. To analyze and compare the performance of the LLMs in answering the research questions, a confusion matrix was constructed. Accordingly, four key metrics were calculated: sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV), in addition to overall accuracy. The LLMs included in the study demonstrated perfect sensitivity, each achieving a value of 1. Regarding specificity, ChatGPT and Perplexity attained the highest scores of 0.8, while Gemini and Copilot exhibited comparatively lower specificity values of 0.66 and 0.6, respectively. Furthermore, ChatGPT and Perplexity recorded the highest accuracy rates of 0.93, surpassing Gemini and Copilot, both of which achieved an accuracy of 0.86. The findings provided a detailed assessment of the performance of the LLMs. Results indicated that the performance of LLMs generally declined as the complexity, length, and verbosity of questionnaire items increased. Additionally, certain LLMs, such as Copilot, demonstrated particular difficulty when responding to quantitative questions involving numerical data. Further research is recommended to investigate these observations more comprehensively.
β-Glucan, which is a type of polysaccharide, provides both cytoprotective effects and immunomodulatory effects, which scientists can use to develop treatments for hematologic conditions and melanoma through their ability to regulate lipid-hormone pathways. This study develops a computational framework to validate its effectiveness by combining network pharmacology and docking and Molecular Dynamics (MD) and In-vitro studies and experimental proof to study the immunomodulatory effects and antimelanoma Lipid-Hormone Axis modulation effects and Erythroprotection capabilities of β-Glucan obtained from Plum (Prunus bokharensis) PLYE1 through its extraction and characterization. β-Glucan extracted (NaOH/NaClO, 3.2 gms ± % yield) and characterized (HRMS: DP 1-5, β-(1 → 3)/(1 → 6) linkages). Network analysis using TCMSP and DisGeNET together with network analysis methods identified 110 compounds and 204 melanoma genes as research targets while they classified β-Glucan with a perfect score of 100 and identified NPC1L1 and HMGCR and NR1H3 and ESR2 and SHBG as essential network elements. Docking displayed high-resolution parameters between 2.00 and 3.3 Å together with high stereochemical accuracy enabled target protein assessment. Ramachandran plot analysis addressed ligand bound structures maintained their biological integrity which enabled accurate evaluation of β-Glucan therapeutic potential. The 500 ns MD simulation of the 7N4X-β-Glucan complex demonstrated binding stability through RMSD measurements which disclosed that the system reached equilibrium between 200 and 250 ns and maintained a stable distance range of 4.5 to 5.5 Å. The RMSF results showed that most residues displayed minor fluctuations beneath 1.5 Å while loop areas experienced greater movement that reached 3 to 6 Å. The radius of gyration remained stable at 4.3 to 5.0 Å which demonstrated that the protein maintained its compact structure while continuing to interact with ligands. The in-vitro study demonstrates that β-Glucan functions as an external ligand which triggers receptor-based immune response activation together with an increase in phagocytosis of 72.14% and an induction of cell death at an IC50 value of 104.33 μg/mL which caused BAX to increase and BCL2 to decrease but maintained Caspase-3 at its normal level. The docking results together with the 500 ns MD stability tests provide evidence that Prunus bokharensis (PLYE1) derived β-Glucan maintains its stable structural properties while interacting consistently with its key target molecules. The study demonstrated that the substance regulates lipid-hormone activity and offers therapeutic benefits through its immunomodulatory effects which occur via receptor pathways and its ability to induce apoptosis, thus establishing its value as a versatile bioactive molecule.
To evaluate whether variability in patients' communication style (personality), international English-accents (human and synthetic) and speech impairments affects the accuracy of a Clinical AI Scribe (CAIS) and identify where performance degrades to inform pre-deployment validation and monitoring. We conducted simulated primary-care consultations using trained actors. For personality types, four scenarios were enacted, each with five patient-personality types. For accents, transcripts of consultations were used to generate combinations of seven accents across five scenarios. The CAIS produced summaries that were compared with transcripts, and errors classified as omissions, factual inaccuracies or hallucinations. For speech impairments, public recordings representing five profiles were transcribed and word-recognition accuracy calculated. Personality types showed no statistically significant differences in errors (all p>0.05). Extraversion had the highest total errors (median 3.5). Across accents, comparisons were non-significant for both patient and doctor voices (patients: p=0.851; doctors: p=0.980). Omissions predominated, with low rates of hallucinations and factual inaccuracies. Omissions were slightly higher for Chinese-accented and Indian-accented doctors (both medians 3.0). Conversely, speech impairments differed: cleft palate and vowel disorders were near-perfect, whereas phonological impairment markedly reduced recognition (p<0.001). Operationally, CAIS deployment should include clinician-in-the-loop verification, subgroup performance monitoring (accents, impairments) and predefined 'switch-off' criteria for severe phonological patterns. High-quality synthetic voices are a pragmatic proxy for accent testing when balanced corpora are unavailable. Under controlled conditions CAIS performance was broadly stable across communication styles and most accents, but vulnerable to specific speech characteristics, particularly phonological impairment, in this single-system simulation study.
In silico identifying modes of toxic action (MOAs) of toxic chemicals are of significance to gaining knowledge about the toxic severities and mechanisms of chemicals at the whole-organism level. Currently, existing in silico models commonly focus on a diversity of chemical MOAs to one specific species, without considering their toxic effects to off-target organisms. In this work, we propose a quantitative structure-activity relationship (QSAR)-based multi-label extreme gradient boosting (XGBoost) model to consider the scenario that a pesticide exhibits multiple MOAs on different organisms across trophic levels. In the integral space of 16 taxa-specific MOAs, pesticides are represented by the structural fingerprint MACCSKeys and Morgan, and each of the 16 taxa-specific MoAs is treated as a class label. K-fold cross-validation (k = 10) shows that the proposed multi-label XGBoost model achieves 0.81 micro recall, 0.72 macro recall, 0.55 perfect match ratio, and 69.5% Jaccard accuracy. Among the 16 taxa-specific MOAs, 12 MOAs achieve satisfactory recall rates ranging from 0.643 (aquatic-narcosis) to 0.986 (aquatic acetylcholinesterase inhibition). An external test shows that 84.62% of the herbicides, exhibiting plant photosynthesis inhibition, are correctly recognized. The holdout test shows that the proposed model, though possessing a much higher complexity of label space, outperforms or performs equivalently to existing multi-class model (linear discriminant analysis). Computational results show that narcosis majorly exhibits as an independent toxic effect, or an accessory/baseline toxic effect preferentially accompanied by reactive or other specific MOAs, and that the proposed multi-label XGBoost model potentially benefits deriving baseline (narcosis) toxicity models for the studied organisms. © 2026 Society of Chemical Industry.
Chronic liver disease (CLD) and subsequent liver cirrhosis (LC) are common causes of death and healthcare-related socio-economical costs worldwide. Ultrasound (US) is the first-line imaging modality for assessing the liver and associated hepatocellular carcinomas. Poor quality liver US images caused by aging or inadequate management of US equipment, can pose significant challenges in both diagnosis and treatment. From this perspective, the aim of this study was to enhance and assess the image quality of liver US obtained from an older, lower-performing device using a deep learning approach. A neural network based on a switchable cycle generative adversarial network (CycleGAN) was trained in an unsupervised learning setting, with low-quality images as inputs and high-quality images as targets. The study included consecutively acquired grey-scale liver US examinations from both a 12-year-old and a 4-year-old US device. Images from the older device served as inputs, while images from the newer device were used as targets for the deep learning-based algorithm. Image quality was evaluated by two experienced reviewers. The algorithm significantly improved the brightness, contrast, and overall quality of the reconstructed liver US images (p < 0.001), as assessed by both reviewers. However, no significant differences in image resolution and reverberation artifacts were noted by one of the reviewers. The weighted kappa values for image quality and diagnostic performance ranged from 0.225 to 0.838, indicating fair to almost-perfect inter-reader agreement. The proposed algorithm effectively enhances low-quality liver US images to high diagnostic quality, thereby potentially supporting clinical assessment and intervention in patients with LC.
Venous sinus stenting (VSS) and ventriculoperitoneal shunting (VPS) are established interventions for idiopathic intracranial hypertension (IIH), yet comparative evidence remains limited. Treatment selection is often influenced by institutional preference, and retrospective studies are frequently affected by baseline differences between patient groups. This study aimed to compare outcomes between VSS and VPS using propensity score overlap weighting to reduce confounding by indication and achieve balanced comparison. A retrospective cohort study was conducted including all patients treated with VSS or VPS for IIH at a single institution between 2021 and 2024. Baseline demographics, clinical characteristics, procedural details, and postoperative outcomes were collected. Propensity scores were estimated using logistic regression, and overlap weights were applied to generate a balanced pseudopopulation. Weighted logistic regression was used to compare postoperative complications, clinical outcomes, unsatisfactory treatment response, and need for salvage procedures. A total of 139 patients were included (VSS: n = 99; VPS: n = 40). Overlap weighting achieved near-perfect covariate balance (all standardized mean difference <0.1). After weighting, VSS was associated with significantly lower odds of postoperative complications compared with VPS [odds ratio (OR) 0.06, 95% CI 0.02-0.23; P < .001]. Persistently elevated postoperative opening pressure was more frequent after VSS (OR 10.64, 95% CI 1.88-60.15; P = .008). Rates of unsatisfactory treatment response (OR 0.51, P = .153), need for salvage procedures (OR 1.80, P = .326), and resolution of headache, papilledema, tinnitus, and visual symptoms were not significantly different between treatments (all P ≥.05). In this propensity score-weighted comparison, VSS and VPS produced similar symptom-based outcomes and rates of unsatisfactory treatment response. However, VSS demonstrated a substantially more favorable procedural safety profile, with significantly fewer and less severe complications. These findings suggest that VSS may offer a safer alternative to VPS for appropriately selected patients while providing comparable clinical effectiveness.
BRCA1/2 testing is currently recommended at diagnosis for high-grade serous ovarian carcinoma (HGSOC) because of its impact on patients' survival when treated with poly(adenosine diphosphate ribose) polymerase inhibitors. Standard clinical practice involves analyzing BRCA1/2 genes in formalin-fixed, paraffin-embedded (FFPE) histological specimens. However, because neoplastic ascites is a common clinical presentation in HGSOC and provides a good source of neoplastic cells via a less invasive procedure, it is worthwhile to explore the feasibility of BRCA1/2 testing on cytological specimens obtained from malignant ascites. BRCA1/2 status was analyzed in 34 ascites-derived cytological samples via an amplicon-based next-generation sequencing (NGS) approach, and the results were compared with those from FFPE tissues previously tested in routine clinical practice. A perfect match was observed between BRCA1/2 testing results from neoplastic ascites and surgical samples (100% concordance) for all pathogenic variants, including both germline and somatic mutations. This is the first study to report such high concordance within the largest collection of somatic variants analyzed to date. Additionally, molecular NGS testing was demonstrated to be feasible even in malignant ascites with a low tumor fraction and with archived material. This study shows that ascites can be a suitable specimen for BRCA1/2 NGS testing, and provides a minimally invasive option for disease diagnosis and the early detection of key molecular biomarkers essential for the clinical management of women with HGSOC.
Selectively realizing metastable phases is challenging, despite their vast number, due to the interplay of kinetics and thermodynamics, which has to be realized through processing conditions. Here, we demonstrate selective and controlled realization of metastable phases through size confinement effects. This is realized during diffusion-controlled filling of nanocavities. By nanomolding AuSi alloy and performing crystal structure and elemental analysis of molded alloy nanowires, we observe three different phase states, i.e., hybrid nanophase, bamboo-shaped, and uniform alloy phases, corresponding to different nanowire sizes. The surprising uniform alloy phase for nanowires with diameter below ∼20  nm exhibits perfect single crystalline structure with uniform distribution of Au and Si compositions. A phase state-energy map is constructed to rationalize the findings by considering the competition between mixing enthalpy, configuration entropy, strain energy and interfacial energy. Our Letter demonstrates the universal applicability of nanomolding in fabricating metal-metal and metal-nonmetal metastable phases. In contrast to state-of-the-art rapid cooling techniques, this strategy is simple and practical, thereby enabling precise selection of novel materials with tailored functionalities.
This study integrated machine learning and explainable artificial intelligence (XAI) to classify four processing degrees of Eucommiae cortex (EC): raw, under-processed, moderately processed and over-processed. Powder characteristics (L⁎, a⁎, b⁎), and decoction piece characteristics (R, G, B, GRAY, and the percentage of breaking elongation (PBE)) of EC were collected via spectrophotometer, high-resolution camera, and material testing machine, respectively. Among the evaluated models, XGBoost demonstrated superior performance, achieving perfect training accuracy (100%) and test accuracy (94.74%). SHAP explanation identified PBE, a⁎ and h as key features. Furthermore, models based on decoction pieces characteristics outperformed powder features. The integrated Analytic Hierarchy Process (AHP)-Entropy Weight Method (EWM) analytical approach was acknowledged the relationship of appearance and components. This study successfully established an intelligent, rapid, and accurate method for identifying the processing degrees of EC decoction pieces, offering a valuable reference for at-line sampling quality control in industrial production.
There is currently a limited studies regarding the application of analytic rubrics for assessing student's performance in preclinical tasks. This study aimed to examine the use of a validated rubric structure to measure the performance outcomes of preclinical dental students conducting Class II composite restorations. Validated analytical rubric for evaluation of preclinical class II composite cavity preparation and restoration was used at preclinical phase by two male examiners, who were further calibrated before the assessments using the rubric. Scoring was based on a 4-point scale for assessment and evaluation of two parameters: cavity preparation (40 points) and restoration (20 points). Descriptive statistics were applied to calculate rubric parameters, while independent t-tests compared scores across examiners and between student genders and groups. The associations between Grade Point Average (GPA), evaluators, and the gender of participants, both Pearson's correlation coefficient and the Kappa test were utilized. Overall mean scores were marginally elevated for female participants relative to their counterparts (54.71 vs 52.92) across cavity preparation (36.71 vs 34.87) and restoration parameters (18.00 vs 18.06). Among cavity preparation parameters, only the finishing of the cavity preparation rubric parameters demonstrated a significant difference (p<0.05). For the class II composite restoration parameter steps, we found no overall differences (p ≥ 0.050), except a gender significant difference in matrix band application and anatomy steps (p=0.022 and 0.034, respectively). Significant difference between gender was recorded by the first evaluator (p=0.006), with a significantly higher overall average (p=0.047). A strong and significant correlation was documented between participant GPA and specific evaluated parameters (p=0.000). Within this preclinical setting, gender-based variations were evident across all assessed procedural parameters. Additionally, variability was observed between the two examiners, with agreement levels ranging from fair to moderate, and in some instances, reaching perfect agreement.
Generative models enhance neuroimaging through data augmentation, quality improvement, and rare condition studies. Despite advances in realistic synthetic MRIs, evaluations focus on texture and perception, lacking sensitivity to crucial morphometric fidelity. This study proposes a new metric, called WASABI (Wasserstein-Based Anatomical Brain Index), to assess the morphometric plausibility of synthetic brain MRIs. WASABI leverages SynthSeg, a deep learning-based brain parcellation tool, to derive volumetric measures of brain regions in each MRI and uses the multivariate Wasserstein distance to compare distributions between real and synthetic anatomies. Based on controlled experiments on two real datasets and synthetic MRIs from five generative models, WASABI demonstrates higher sensitivity in quantifying morphometric discrepancies compared to traditional image-level metrics, even when synthetic images achieve near-perfect visual quality. Our findings advocate for shifting the evaluation paradigm beyond visual inspection and conventional metrics, emphasizing morphometric fidelity as a crucial benchmark for clinically meaningful brain MRI synthesis. Our code is available at https://github.com/BahramJafrasteh/wasabi-mri.
High-resolution transmission electron microscopy (HRTEM) is an important method for imaging beam sensitive materials often under cryo conditions. Electron ptychography in the scanning transmission electron microscope (STEM) has been shown to reconstruct low-noise phase data at a reduced fluence for such materials. This raises the question of whether ptychography or HRTEM provides a more fluence-efficient imaging technique. Even though the transfer function is a common metric for evaluating the performance of an imaging method, it only describes the signal transfer with respect to spatial frequency, irrespective of the noise transfer. It can also not be well defined for methods, such as ptychography, that use an algorithm to form the final image. Here we apply the concept of detective quantum efficiency (DQE) to electron microscopy as a fluence independent and sample independent measure of technique performance. We find that, for a weak-phase object, ptychography can never reach the efficiency of a perfect Zernike phase imaging microscope but that ptychography is more robust to partial coherence.
To determine the prevalence of malnutrition in hospitalized older adults using the Global Leadership Initiative on Malnutrition (GLIM) criteria as the reference method and to evaluate the diagnostic performance of commonly used nutrition screening tools. In this prospective cross-sectional study, adults aged > 60 years admitted to internal medicine wards were assessed within 48 h of admission. Nutritional risk was screened using the Nutritional Risk Screening 2002 (NRS-2002), Mini Nutritional Assessment-Short Form (MNA-SF), Malnutrition Universal Screening Tool (MUST), and the European Society for Clinical Nutrition and Metabolism (ESPEN) diagnostic definition. Malnutrition was classified using GLIM criteria as the reference method. Sensitivity, specificity, predictive values, likelihood ratios, and Cohen's kappa were calculated. A total of 202 participants were included; 33.2% (n = 67) met GLIM-defined malnutrition criteria. Malnourished participants were older and had lower body mass index, albumin, and haemoglobin levels than well-nourished participants. The prevalence of malnutrition risk was highest with MUST (61.9%), followed by MNA-SF (43.6%) and NRS-2002 (31.2%). MNA-SF and MUST showed the highest sensitivity (94%), whereas ESPEN demonstrated perfect specificity (100%) but low sensitivity (25.4%). Agreement with GLIM was strongest for MNA-SF (κ = 0.700). In hospitalized adults aged > 60 years, one in three met GLIM-defined malnutrition criteria. Among the evaluated tools, MNA-SF provided the most favourable balance of sensitivity and agreement with the GLIM reference method, supporting its use for routine screening in geriatric inpatient settings.
Depression is a leading cause of global disability, motivating the development of objective and scalable diagnostic approaches. Quantitative electroencephalography (QEEG) combined with machine learning (ML) and deep learning (DL) techniques has gained increasing attention for depression detection. This systematic review aimed to critically examine and descriptively compare the methodologies, performance metrics, and limitations of ML- and DL-based models applied to EEG data for depression detection. A systematic review was conducted in accordance with PRISMA 2020 guidelines. Seven electronic databases (PubMed, Scopus, IEEE Xplore, ScienceDirect, Web of Science, SAGE Journals, and MDPI) were searched for peer-reviewed studies published between 2020 and 2024. Eligible studies included human participants, used EEG signals for depression detection, and applied ML or DL algorithms. Extracted information comprised algorithm type, sample size, EEG acquisition parameters, validation strategies, and reported performance metrics, which were synthesized descriptively across studies. Risk of bias was assessed using the Quality Assessment of Diagnostic Accuracy Studies-2 (QUADAS-2) tool. A total of 42 studies met the inclusion criteria, including 23 ML-based and 19 DL-based investigations. Reported classification accuracy ranged from approximately 76% to 100%. DL studies showed a higher mean reported accuracy than ML studies (93.92% vs. 90.78%); however, this difference was not statistically significant in the exploratory non-parametric comparison. Near-perfect performance values were frequently observed in studies with small sample sizes and subject-dependent or exclusively internal validation strategies, raising concerns regarding overfitting and limited generalizability. Studies relying on publicly available datasets tended to report more stable performance. QUADAS-2 assessment revealed recurrent risk-of-bias concerns, particularly in the domains of patient selection and index test conduct. Both ML and DL approaches demonstrate potential for EEG-based depression detection, but reported performance differences between them should be interpreted cautiously. Although DL studies tended to report higher accuracy values, this pattern was not statistically significant in exploratory analyses and was strongly influenced by sample size, validation strategy, and methodological design. Future research should prioritize larger and more diverse samples, subject-independent or external validation strategies, and standardized reporting frameworks to enhance methodological rigor and clinical applicability.
BackgroundNon-small cell lung cancer (NSCLC) patients who have mutations in their epidermal growth factor receptor (EGFR) gene respond more favorably to tyrosine kinase inhibitors (TKIs) than to standard chemotherapy. However, tissue biopsy-based EGFR testing is invasive, costly, and technically challenging. Plasma-derived circulating tumor DNA (ctDNA) offers a minimally invasive and cost-efficient alternative for mutation profiling. This study assessed the agreement between EGFR mutation status in plasma-derived ctDNA and tissue biopsy in NSCLC patients from tertiary care hospitals in Bangladesh.MethodsIn this cross-sectional analytical study, we recruited 32 patients with NSCLC before EGFR-TKI treatment. EGFR mutations in ctDNA samples were identified using the Amplification Refractory Mutation System (ARMS) polymerase chain reaction method. Tissue biopsy results were obtained from routine diagnostic procedures. Agreement between ctDNA and tissue biopsy results was assessed using kappa statistics, and diagnostic performance metrics were calculated.ResultsMost of our study participants were male (75%) and had stage IV lung adenocarcinoma (72%). We observed substantial agreement between plasma-derived ctDNA samples and tissue biopsies (kappa, κ = 0.683). This agreement was almost perfect (κ = 0.826) when calculated for patients with stage IV disease. The overall concordance was 84.4%. Compared with tissue biopsy, ctDNA testing yielded a sensitivity of 73.3% and a specificity of 94.1%.ConclusionPlasma-derived ctDNA demonstrates substantial agreement with tissue biopsy for EGFR mutation detection in patients with NSCLC, particularly those with advanced-stage disease. These findings support ctDNA as a viable alternative for molecular profiling in settings where tissue biopsy is limited or impractical.
BackgroundLarge language models (LLMs) have demonstrated promising capabilities in medical diagnostic reasoning, yet their performance in specialized clinical domains such as rheumatology remains incompletely characterized. While diagnostic accuracy has been evaluated, critical dimensions including calibration, reasoning quality, and temporal stability have not been systematically assessed across contemporary models.ObjectivesThis study aimed to comprehensively evaluate and compare the diagnostic accuracy, certainty expression, reasoning quality, and hallucination rates of four state-of-the-art LLMs ChatGPT-4, Claude 3.5, DeepSeek-V3, and Gemini 1.5 Pro in complex rheumatologic case scenarios.DesignA cross-sectional, analytical, and comparative study was conducted following STARD and TRIPOD guidelines, adapted for LLM evaluation. Nine complex rheumatologic cases from published case reports were evaluated at three time points (Days 1, 5, and 10) between July 1 and September 18,2025.MethodsStandardized clinical vignettes were submitted to each LLM under controlled experimental conditions. Two blinded senior rheumatologists independently assessed diagnostic accuracy, reasoning quality across five analytical dimensions using Likert scales, and hallucination frequency. Certainty expression and temporal stability were quantified using intraclass correlation coefficients. Correlation analyses examined relationships between reasoning quality and confidence expression.ResultsAll models achieved near-perfect diagnostic accuracy, with ChatGPT, Claude and Gemini correctly identifying the primary diagnosis in 100% of cases and DeepSeek in 88.9%. However, Spearman correlation analysis revealed uniformly weak and non-significant associations between reasoning quality and expressed certainty across all models (ρ range: -0.156 to 0.215, all p>0.05), indicating fundamental miscalibration. ChatGPT demonstrated the highest reasoning score (3.89±0.23) and lowest hallucination rate (7.4%), while Gemini showed the highest hallucination frequency (18.5%). Temporal stability was excellent for ChatGPT (ICC=0.84) and good for DeepSeek (ICC=0.79).ConclusionDespite exceptional diagnostic accuracy, current LLMs exhibit critical limitations in confidence calibration and variable hallucination rates, representing significant barriers to safe clinical deployment in rheumatology.
The nickel-containing carbon monoxide dehydrogenase (CODH) uses a unique heterometallic [NiFe4S4] cluster active site, called the C-cluster, to catalyze the reversible reduction of carbon dioxide (CO2) to carbon monoxide (CO) at low overpotential and with perfect selectivity. Only the properly assembled nickel-bound form is capable of this reactivity, though how the structure of the cluster promotes such selectivity remains poorly understood. We have developed a model of the C-cluster by constructing a [NiFe3S4] cluster in the iron-sulfur cluster binding site of the Pyrococcus furiosus ferredoxin protein (NiFd) that replicates the thiolate ligation and aqueous environment of the native system. In this work, we interrogate the roles of each individual metal site and the whole-cluster covalency across two oxidation states that mirror the C-cluster in the Cox and Cred1 states. We have also studied the system bound to a CODH substrate (CO) and C-cluster inhibitor (CN-). A comprehensive suite of spectroscopic techniques, including pulsed electron paramagnetic resonance (EPR), variable-temperature, variable-field Mössbauer, and high-energy resolution fluorescence-detected X-ray absorption (HERFD-XAS) spectroscopy, have been used in conjunction with quantum mechanics/molecular mechanics (QM/MM) and broken symmetry density functional theory (BS-DFT) calculations to elucidate the electronic properties of these heterometallic clusters. This work reveals that the supporting iron-sulfide subcluster and thiolate ligands play a critical role in buffering charge density as the cluster traverses multiple states. An unusually weak exchange interaction between the Ni site and the iron atoms is found to exist in the CO-bound form, suggesting that substrate binding electronically isolates the nickel site, giving a low-spin ground state that drives localized chemistry to occur at the nickel center. These results have implications for understanding how reactivity is controlled in native CODH to promote CO oxidation and CO2 reduction rather than deleterious hydrogen evolution.
The Schalkwijk-Kailath (SK) scheme for the AWGN channel with noise-free feedback is well-known since its coding complexity grows linearly with the coding blocklength, and it is capacity-achieving and has extremely lower decoding error probability comparing with existing excellent codes such as LDPC and Polar codes. However, extension of SK scheme to more practical scenarios is challenging since it depends heavily on the feedback success. This paper aims to investigate how to design SK-type schemes for channels with intermittent feedback and establish the relationship between the coding blocklength, the desired decoding error probability, the achievable transmission rate and secrecy level of our proposed schemes. Specifically, first, for the additive white Gaussian noise (AWGN) channel with intermittent noise-free feedback, a variation of the well-known Schalkwijk-Kailath (SK) scheme for the AWGN channel with noise-free feedback is proposed, and the corresponding achievable rate is characterized for given coding blocklength and decoding error probability. Subsequently, the proposed scheme is extended to channels with noisy intermittent feedback. To quantitatively evaluate the secrecy performance, we adopt the eavesdropper's normalized equivocation as the secrecy metric and analytically characterize the achievable secrecy level of the proposed schemes. In particular, we show that perfect weak secrecy, i.e., asymptotically vanishing information leakage rate, can be achieved under certain conditions. Numerical results show that for a given decoding error probability threshold, our proposed schemes require lower signal-to-noise ratio and significantly shorter coding blocklength comparing with LDPC code. The study of this paper may provide a way to construct efficient coding scheme for channels in the presence of intermittent feedback.