Prognostic prediction following gastric cancer surgery plays a pivotal role in postoperative management, helping to optimize therapeutic strategies and improve patient survival. Standard clinicopathological indicators, including tumor differentiation and lymph node metastasis, continue to serve as the basis for outcome evaluation; however, they do not adequately represent the host's systemic inflammatory response and immunonutritional status, both of which significantly affect tumor progression and postoperative recovery. Systemic inflammatory markers, such as the Neutrophil-to-Lymphocyte Ratio (NLR) and Platelet-to-Lymphocyte Ratio (PLR), have emerged as reliable, noninvasive prognostic indicators. However, the complex and nonlinear interactions among inflammatory, clinical, and demographic variables pose a limitation for traditional statistical methods. This study proposes a novel deep learning framework that integrates three major components: Gradient-Boosted Decision Tree, Tree-Driven Encoder (TDE), and one-dimensional Convolutional Neural Network (1D-CNN) for postoperative prognostic prediction in gastric cancer. The GBDT module captures intricate dependencies among clinical and inflammatory variables, the TDE transforms tree-based structures into unified binary embeddings, and the 1D-CNN component learns high-level feature representations from these embeddings to predict postoperative prognosis. The model's performance was evaluated using cross-validation and compared with various traditional machine learning algorithms and advanced deep learning architectures for tabular data. Experimental findings demonstrate that the proposed hybrid framework consistently outperforms both traditional and general deep learning models in predicting postoperative prognosis. By combining tree-based feature structuring with deep representation learning, the model effectively captures nonlinear and hierarchical relationships among systemic inflammatory markers and clinicopathological features. This approach achieves high predictive accuracy, robustness, and generalization capability, particularly in identifying high-risk patients characterized by elevated inflammatory activity. Moreover, the model exhibited stable performance across multiple random seeds and data partitions, confirming its reproducibility and reliability under different experimental conditions. This study presents a data-driven and interpretable deep learning framework for postoperative prognostic prediction in gastric cancer. By integrating the strengths of gradient-boosted tree modeling and deep neural representation learning, the proposed model provides a more comprehensive understanding of the interplay among inflammation, nutrition, and tumor biology, supporting personalized treatment planning and evidence-based clinical decision-making. Future research will focus on external validation using independent cohorts, real-time clinical application, and enhancing model explainability to facilitate clinical adoption.
We learn to recognize a vast array of familiar objects, a process involving learning-related changes in inferotemporal cortex (IT) activity. A challenge to discovering mechanisms of familiarity learning is that it spans multiple timescales from minutes to days, and is accompanied by simultaneous changes in cellular, synaptic, and network properties. We leverage an integrated experimental-theoretical approach, using IT recordings in two male macaques during familiarity learning within and across sessions to infer underlying plasticity mechanisms. We identified two timescales of learning-related changes spanning minutes to days, consistent with distinct synaptic and cellular mechanisms. Across sessions, averaged responses gradually decreased with familiarity, consistent with synaptic plasticity. In contrast, within-session changes, including rapid response decay and increased spontaneous activity, aligned with intrinsic plasticity mechanisms. Recurrent networks endowed with learning rules inferred from experiments replicated the observed learning dynamics, supporting our hypothesis of distinct learning mechanisms - slow, synaptic plasticity at long timescales and fast, intrinsic plasticity at short timescales.
The transition from undergraduate education to medical school demands increasing learner autonomy and self-regulation. Guided by Zimmerman's Self-Regulated Learning (SRL) theory and Grow's Staged Self-Directed Learning Model, this study examined how self-directed learning readiness (SDL-R), encompassing learning motivation, planning and implementation, self-monitoring, and interpersonal communication varies across the preclinical curriculum and how these domains relate to academic performance. A mixed longitudinal and repeated cross-sectional study (2022-2024) was conducted among preclinical medical students (N = 807 responses; 434 unique students from three cohorts, Classes of 2025-2027) at an LCME-accredited US medical school. All enrolled first- and second-year students were eligible; recruitment was voluntary via web-based REDCap surveys administered once per semester. Linear mixed-effects models evaluated within-student change in total SDL-R and its four domains across semesters. Ordinary least-squares regression with cluster-robust standard errors assessed the contribution of standardized subscale scores to cumulative grade point averages. Bonferroni correction was applied within each family of comparisons, with effect sizes and 95% confidence intervals reported. SDL-R scores varied across preclinical semesters in a pattern consistent with developmental progression, with a modest decline in first-year Spring followed by recovery through year two. Planning and implementation was the strongest positive predictor of GPA in Year 1 (β = 0.142, p = .004) and in the combined model (β = 0.154, p < .001). The SDL-R domains collectively explained 22.2% of GPA variance in Year 1 (R² = 0.222) and 9.0% in Year 2 (R² = 0.090); the combined model explained 15.0% (R² = 0.150). Interpersonal communication showed a significant negative partial association with GPA, consistent with a statistical suppressor effect. Second-year students, older learners, males, and higher-achieving students showed higher SDL-R levels. SDL-R is a dynamic, context-sensitive competency during preclinical training, with planning and implementation as its strongest academic predictor. Targeted curricular interventions that scaffold metacognitive planning, self-monitoring, and adaptive strategy use may enhance both academic performance and lifelong learning capacity.
To develop and validate an interpretable machine learning model for distinguishing prostate cancer (PCa) with Gleason Score (GS) ≤ 3+4 from those with GS ≥ 4+3, and to explore the predictive value and biological significance of core radiomic features. This study enrolled a retrospective multicenter cohort, including 225 PCa patients. Two radiologists manually segmented intratumoral and peritumoral regions of interest on T2-weighted imaging and extracted radiomic features. We built machine learning models by integrating clinical variables with intratumoral and peritumoral radiomic features. To enhance model interpretability, we used SHapley Additive exPlanations (SHAP), correlation analysis, and mediation analysis. Model performance was evaluated using the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, balanced accuracy, F1 score, and other relevant metrics. The combined model integrating clinical features and IntraPeri_1mm radiomic features achieved the best overall performance. It demonstrated robust discriminative performance, with AUCs of 0.975 (95% CI, 0.945-1.000) in the training set, 0.854 (95% CI, 0.717-0.991) in the internal test set, and 0.795 (95% CI, 0.694-0.895) in the external validation set. SHAP analysis identified the core predictive feature. This feature correlated positively with high GS and showed significant yet weak correlations with aggressive clinical-radiological indicators. Mediation analysis revealed that this feature partially mediated the relationship between PI-RADS score and GS (indirect effect = 0.309, 95% CI, 0.061-0.875). The interpretable machine learning model demonstrates excellent performance in predicting PCa grading.
This study aimed to evaluate the performance of deep learning-based super-resolution ultrashort echo time magnetic resonance imaging (SR-UTE MRI) for the detection of pulmonary nodules and the reliability of longitudinal nodule size assessment compared with conventional ultrashort echo time (UTE) in patients with emphysema. This prospective study included 127 patients with emphysema and pulmonary nodules who underwent chest low-dose computed tomography (LDCT) and UTE MRI. SR-UTE images were reconstructed using a deep learning-based algorithm. Pulmonary nodule detection performance, size measurement accuracy, and longitudinal reproducibility were assessed using LDCT as the reference standard. Patients were stratified according to CT-derived emphysema severity quantified by low-attenuation area percentage (LAA%). Objective image quality metrics, including signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR), and Laplacian variance, were also evaluated. SR-UTE demonstrated superior image quality compared with conventional UTE, with higher SNR and CNR in patients with severe emphysema (both p < 0.05). For nodules < 6 mm, SR-UTE achieved a higher detection rate than UTE (81.8% vs. 54.5%, p < 0.05), with greater improvement for ground-glass nodules in severe emphysema (66.7% vs. 33.3%). Both UTE and SR-UTE achieved 100% sensitivity for nodules > 8 mm. SR-UTE showed better agreement with CT for nodule size measurement, with higher intraclass correlation coefficients across emphysema subgroups (0.89-0.93 vs. 0.84-0.88 for UTE) and reduced measurement bias. In longitudinal analysis, SR-UTE more accurately reflected CT-derived nodule size changes and demonstrated higher reproducibility than UTE (ICC 0.90 vs. 0.82). SR-UTE improved the detection of small and subsolid pulmonary nodules and provided more reliable longitudinal size assessment than UTE in patients with emphysema. These findings support the potential role of SR-UTE MRI in MRI-based surveillance of pulmonary nodules.
To develop and internally validate a radiology-centered machine-learning model using preoperative MRI and clinical characteristics to predict arthroscopic meniscal repairability. A retrospective cohort of 491 patients who underwent knee MRI followed by arthroscopy between 2018 and 2023 was analyzed. Preoperative predictors included demographic variables, injury mechanism, and a comprehensive set of MRI-derived features. Meniscal morphology, bone marrow edema, joint effusion, cruciate ligament integrity, cartilage degeneration, tear displacement, ramp lesion, and extrusion distance were systematically assessed by two musculoskeletal radiologists. Interobserver agreement was evaluated using Cohen's kappa and intraclass correlation coefficients (ICCs). Least absolute shrinkage and selection operator (LASSO) regression was used to identify the most informative predictors. Logistic regression, random forest, gradient boosting machine (GBM), and support vector machine (SVM) models were trained using five-fold stratified cross-validation with hyperparameter tuning via grid search. Model performance was evaluated using area under the receiver operating characteristic curve (AUC) with 95% confidence intervals, calibration metrics (calibration slope, intercept, and Brier score), and decision curve analysis (DCA). LASSO selected 13 preoperative predictors spanning eight clinically relevant domains. Logistic regression achieved the highest cross-validated performance (AUC = 0.777, 95% CI 0.735-0.819), followed by SVM (AUC = 0.771), random forest (AUC = 0.770), and GBM (AUC = 0.755). Multivariable logistic regression identified ACL injury (OR 0.38, 95% CI 0.25-0.57, p < 0.001), high-grade cartilage degeneration (OR 0.42, 95% CI 0.28-0.63, p < 0.001), greater BMI (OR 1.08 per kg/m2, 95% CI 1.03-1.13, p = 0.002), male sex (OR 0.51, 95% CI 0.33-0.78, p = 0.002), and ≥ 3 mm tear displacement (OR 0.44, 95% CI 0.29-0.67, p < 0.001) as independent predictors of non-repairability. Calibration analysis demonstrated good agreement between predicted and observed probabilities (calibration slope 0.95, intercept -0.08, Brier score 0.21). DCA demonstrated that logistic regression and random forest provided the greatest clinical net benefit across practical threshold probabilities. A radiology-based machine-learning model integrating detailed preoperative MRI features can accurately predict meniscal repairability with internally validated performance and may assist surgeons in optimizing arthroscopic decision-making and surgical planning pending prospective external validation.
Artificial intelligence (AI) holds significant promise for electrocardiogram (ECG) analysis, yet accurately detecting non-ST-segment elevation myocardial infarction (NSTEMI) and overcoming the "black box" nature of deep learning models remain persistent challenges. Here, we present a comprehensive deep learning framework capable of classifying STEMI, NSTEMI, and non-acute coronary syndrome (non-ACS) from 12-lead ECG images, while also localizing infarction sites. Utilizing ,2070 validated ECGs, our pipeline integrates ResNet for acute myocardial infarction detection, Faster R-CNN for ST-segment elevation localization, and an ensemble approach for final classification. The model achieved a 98.3% AUROC for detection and an overall three-class accuracy of 93.6%, with high F1 scores for identifying infarction territories. To address interpretability, we developed an explainable AI (XAI) web viewer that visualizes detected regions. Furthermore, we evaluated the model's utility as an educational tool in a prospective pilot study with medical students. AI assistance significantly improved the students' overall diagnostic accuracy from 43% to 82% (p < 0.05), with notable gains in identifying NSTEMI and complex STEMI subtypes. These findings demonstrate that our interpretable AI model not only supports clinical decision-making with high diagnostic precision but also serves as an effective educational aid for enhancing novice clinicians' proficiency.
Brain tumors are a leading cause of cancer-related mortality, and manual MRI screening remains time-consuming and observer-dependent. Deep learning (DL) offers automated detection, but clinical translation requires rigorous validation and interpretability. This study introduces a DL framework for brain tumor detection that addresses two major challenges in medical AI: limited dataset availability and lack of interpretability. Preliminary experiments identified InceptionV3 optimized with Nadam as the optimal architecture. To ensure robust validation, this model was retrained using patient-wise stratified fivefold cross-validation on 90% of the data incorporating augmentation and minority oversampling to prevent data leakage. This achieved an overall accuracy of 98.3 ± 0.9%. The final model was then trained on the entire development set using the optimal configuration, thereby leveraging all available labeled data to maximize learning capacity and enhance generalization. Performance evaluation was conducted on three levels: (i) a held out internal test set (10% of the data) for internal assessment, (ii) an external dataset of 3000 unseen images for independent validation, and (iii) quantitative explainable AI (XAI) analyses performed on both internal and external test datasets. The proposed model achieved perfect classification metrics on the internal test set, with 100% accuracy and minimal loss (0.01), and demonstrated strong generalizability on the external dataset with 96% accuracy and minimal loss (0.11). Quantitative XAI analysis demonstrated high faithfulness (Grad-CAM vs. occlusion sensitivity correlation exceeded 0.8), causal importance (top-10% occlusion drop 44% vs. 9% for random occlusion), and specificity to learned weights (Spearman correlation ≈ - 0.01). The proposed pipeline establishes a rigorous, transparent framework for data-limited medical imaging, demonstrating high diagnostic performance with clinically aligned explanations and providing a reliable foundation for trustworthy AI in brain tumor detection.
Lung adenocarcinoma (LUAD) is the most common lung cancer histological subtype. Although the unfolded protein response (UPR) has been linked to various human diseases, its role in LUAD remains unclear. To identify UPR-related genes, we applied various methods, including weighted gene co-expression network analysis, differential expression analysis, and multivariate Cox regression. Ten machine learning algorithms were used to construct a UPR-related signature (UPRRS), which was validated using multiple public LUAD datasets. The UPRRS was integrated into a nomogram used in clinical practice for prognosis prediction. We also evaluated predicted drug sensitivity patterns across different risk subgroups. We identified 33 UPR-associated hub genes. A UPRRS was developed through systematic evaluation of 101 machine-learning combinations, exhibiting stable prognostic performance across multiple cohorts. Integration of the UPRRS into a nomogram facilitated the construction of a quantitative prognostic model. Significant differences in biological processes and tumor microenvironment immune cell infiltration were observed between the high- and low-risk UPRRS groups. All five UPRRS genes (ALDH2, FKBP4, KLF4, LAIR1, SIDT2) were validated at the protein level in LUAD cell lines, and FKBP4 was further confirmed by IHC in clinical tissues. Functional experiments showed that FKBP4 knockdown inhibited proliferation, migration, and invasion of A549 and H1975 cells, supporting a potential role for FKBP4 in LUAD progression. Our UPRRS provides a promising tool for prognostic stratification and may offer additional insights into tumor immune microenvironment characterization and therapeutic response prediction in LUAD.
This paper proposes a semantic-guided edge enhancement approach for graph self-supervised learning in network intrusion detection. It aims to address several issues that the existing intrusion detection systems face, such as relying on a large amount of labeled data, struggling to capture complex network topology, and overlooking the internal information of edges. Concretely, to improve the discriminability of the network flow graph, we introduce a new node‑edge‑node attention algorithm for graph enhancement representation. It integrates edge-aware attention and intra-edge feature self-attention collaboratively, thereby assists the model to perceive complex attack behaviors at multiple granular levels effectively. Meanwhile, we devise a semantic-aware contrastive learning framework that collaboratively enhances nodes and edges, which enables view augmentation without corrupting the original graph semantics, forcing the model to learn more robust and discriminative features. Consequently, our method overcomes the scarcity of labeled samples remarkably. In the experiments, seven SOTA methods were contrasted with the proposed one on four public datasets. The results show that the proposed method outperforms existing mainstream models in accuracy, precision, recall, and F1-score, demonstrating its efficient detection performance and strong generalization capability.
Sudden arrhythmic death remains a major clinical risk in ischemic heart disease (IHD), underscoring the need for improved risk stratification. Late gadolinium enhancement cardiac magnetic resonance (LGE-CMR) provides measures of scar burden and heterogeneity, but its incremental prognostic value beyond conventional markers such as left ventricular ejection fraction remains uncertain. We analysed two independent IHD cohorts (Dataset 1: n = 399, 54 events; National Research Ethics Service approvals 07/H0708/83 and 09/H0504/104+5; Dataset 2: n = 424, 50 events; derived from the prospectively registered REVIVED-BCIS2 trial, ISRCTN45979711, registered 20 November 2012)using clinical and LGE-CMR-derived variables to evaluate the contribution of LGE-CMR features, and compare machine learning-based survival modelling approaches. A brute-force feature-selection strategy identified optimal predictor subsets for Cox proportional hazards models, Random Survival Forests, and DeepSurv, evaluated using cross-cohort and pooled validation strategies. Scar entropy consistently emerged as a strong predictor of major arrhythmic events. Non-linear approaches outperformed Cox regression, with DeepSurv demonstrating superior generalization across cohorts and Random Survival Forests showing robust performance in pooled analyses. These findings support scar heterogeneity as an important prognostic marker and suggest that machine-learning survival models may improve arrhythmic risk prediction in patients with IHD.
In this study, a novel pH-responsive hybrid nanocarrier with a water-in-oil-in-water (W/O/W) emulsion structure was developed using gelatin (G) as a biocompatible polymer, montmorillonite (MMT) as a layered diffusion barrier, and cerium oxide nanoparticles (CeO₂) as a multifunctional stabilizing agent for pH-responsive and controlled delivery of quercetin (QC). The nanocarriers were synthesized via a double-emulsion method and comprehensively characterized by Fourier-transform infrared spectroscopy (FTIR), X-ray diffraction (XRD), field-emission scanning electron microscopy (FESEM), and dynamic light scattering (DLS) with zeta potential analysis. The optimized G/MMT/CeO2@QC nanocarriers exhibited a uniform nanoscale size (39.3 nm) and a high negative zeta potential (- 38.6 mV), indicating excellent colloidal stability. Incorporation of MMT and CeO₂ significantly enhanced drug loading and encapsulation efficiency (43.0% and 84.5%, respectively) compared to the MMT-free G/CeO₂@QC system, due to synergistic effects of layered silicate confinement, gelatin-mediated hydrogen bonding, and CeO2-driven Lewis acid-base coordination. In vitro release studies demonstrated pronounced pH sensitivity, with sustained release at physiological pH (60% at pH 7.4 after 96 h) and accelerated release under tumor-mimicking acidic conditions (95% at pH 5.4). To further interpret the release kinetics, machine learning-assisted, shape-constrained data analysis was employed to provide time-resolved and physically consistent insights into pH-dependent release behavior. Kinetic modeling confirmed Higuchi and Korsmeyer-Peppas-controlled diffusion mechanisms. Cytocompatibility and anticancer activity were evaluated using the MTT assay on A549 lung cancer cells and L929 fibroblasts. Blank nanocarriers were non-toxic (> 95% cell viability), while drug-loaded nanocarriers achieved selective cytotoxicity (A549 viability reduced to 55% with 93% viability in L929 cells), outperforming free QC. Overall, this tri-component hybrid system provides a multifunctional nanoscale platform with controlled drug release, high encapsulation efficiency, and tumor-selective cytotoxicity, demonstrating strong potential as a pH-responsive nanocarrier for lung cancer therapy.
In this research work, an energy management framework for a data centre powered by a combination of solar and grid energy is proposed. A service-level objective (SLO) is assigned to every individual zone of the data centre, which is specified by each data centre's energy requirement. Jobs with the same SLO will be assigned to the same data centre zone, and each zone will be fuelled by renewable source with a chance of generating electricity equal to or greater than the area's need. To address the intermittency issues associated with renewable energy sources and reduce SLO violations, an effective mapping strategy for renewable energy sources and data centre zones was developed using the reinforcement learning technique Deep Q-Network (DQN) algorithm to ensure maximum data centre uptime. A Particle Swarm Optimisation (PSO) algorithm is then applied to maximise the use of renewable power in meeting the data centre demand. This research aids in reducing the energy utilisation impact of data centres on the national grid, thereby avoiding a national energy deficit. Consequently, the anticipated use of renewable energy will minimise environmental degradation while boosting the country's economy. Data centres are another potential energy buyer from geo-distributed renewable energy sources in Pakistan's power market, according to our analysis.
Halide perovskites' remarkable optoelectronic properties stem from their metal halide octahedra. This connection underscores the promise of systematically extracting the physical rules encoded in octahedral motifs to steer the discovery of optoelectronic materials. Here, we develop an octahedral motif-centric, data-driven framework that couples interpretable machine learning with high-throughput first-principles calculations to accelerate the discovery of optoelectronic semiconductors. We construct motif-based descriptors to train a gradient boosting regression tree model for thermodynamic stability evaluation, achieving a low mean absolute error of 83 meV per atom on datasets comprising ~104 materials. Leveraging the model to accelerate materials discovery, we identify 19 unexplored thermodynamically stable semiconductors with favorable optoelectronic properties. Among them Ca2GaCoO5 was successfully synthesized and experimentally verified to exhibit a strong visible light photoresponse. These results support the effectiveness of the machine learning framework for octahedra-containing semiconductors and suggest its potential for extension to other motif-based materials families.
Two-dimensional (2D) materials exhibit a wide range of electronic properties that make them promising candidates for next-generation nanoelectronic devices. Accurate prediction of their quantum transport behavior is therefore of both fundamental and technological importance. While the Non-Equilibrium Green's Function (NEGF) formalism coupled with Density Functional Theory (DFT) provides reliable insights, its high computational cost limits applications to large-scale or high-throughput studies. Here we present DeePTB-NEGF, a framework that combines a deep learning-based tight-binding Hamiltonian derived directly from first-principles calculations (DeePTB) with efficient quantum transport simulations implemented in the DPNEGF package. We validate the method on five prototypical 2D materials (graphene, hexagonal boron nitride (h-BN), [Formula: see text], [Formula: see text], and black phosphorus) demonstrating excellent agreement with conventional DFT-NEGF for band structures and transmission spectra. Beyond single-material benchmarks, we showcase the framework's versatility by exploring strain engineering (uniaxial strain on graphene and biaxial strain on [Formula: see text]), substitution doping in [Formula: see text], and current-voltage characteristics of a graphene field-effect transistor (FET). A scaling analysis reveals that DeePTB-NEGF can simulate systems with hundreds of atoms in minutes, achieving speed-ups of over [Formula: see text] compared to DFT-NEGF for heterostructures such as graphene/h-BN/graphene. These results establish DeePTB-NEGF as a powerful tool for autonomous, high-throughput design of quantum transport in microscopic heterostructures, enabling rapid prototyping of next-generation 2D devices.
Prior biological knowledge and phenotype information can help identify disease genes from whole genome/exome sequencing studies, but how best to incorporate external knowledge with variant data remains challenging. We developed a machine learning algorithm called RankVar to prioritize causative variants for rare diseases, based on clinical notes and genome/exome sequencing profiles. RankVar uses a random forest classifier trained on ~ 1 million variants from the 1000 Genomes Project with spiked-in pathogenic variants. For testing, we compiled sequencing data and phenotype information from several independent datasets: 260 subjects from the Children's Hospital of Philadelphia (CHOP) with positive genetic diagnosis of various Mendelian diseases, 135 subjects from Birth Defects Biorepository (BDB), as well as 356 and 97 subjects with candidate causal variants for autism spectrum disorders from the Simons Simplex Collection (SSC) and the Simons Foundation Powering Autism Research for Knowledge (SPARK), respectively. RankVar achieves a top 10 variant accuracy of 90.0%, 81.5%, 46.1%, and 76.3% for CHOP, BDB, SSC, and SPARK, respectively, with improved performance over existing approaches. Notably, RankVar successfully identified X-linked and Y-linked disease-causal variants, such as KDM6A (p.N915Kfs5*) and SRY (p.W98X), as the top candidate variants. Moreover, we evaluated RankVar for genomic reinterpretation of 130 unsolved CHOP cases with hearing loss and successfully identified 61 candidate causal variants after manual review. In summary, RankVar performed favorably relative to existing methods in our evaluation, accommodated different genetic models and X/Y chromosome variants, and may provide a useful framework for prioritizing variants in monogenic or oligogenic diseases. We anticipate that RankVar may aid in primary genetic diagnosis, genome reinterpretation of previously unsolved cases, and the discovery of novel disease genes.
Hospital-acquired infections (HAIs) remain a major global concern, contributing significantly to increased morbidity, mortality, and healthcare costs. Among the causative pathogens, Escherichia coli (E. coli) is one of the most frequently isolated microorganisms, particularly in urinary tract infections (UTIs), bloodstream infections, and surgical site infections. Early and accurate prediction of E. coli infection in hospitalized patients remains a significant clinical challenge, yet it has the potential to substantially improve patient outcomes. In addition, identifying patient-related risk factors can support targeted infection control strategies. This study aims to evaluate a no-code machine learning (ML) approach for early prediction of E. coli infection and to identify associated risk factors. ML techniques provide a powerful alternative by enabling the analysis of high-dimensional and heterogeneous datasets, facilitating the discovery of hidden patterns and supporting individualized risk prediction. In this study, a total of 300 clinical samples was collected as a training dataset from hospitalized patients between July 2024 and February 2025 across multiple units of Zagazig University Hospital, Sharkia, Egypt. An independent internal validation dataset of 100 samples was collected during May 2026 from the same hospital, its purpose was to evaluate model generalizability on completely unseen data. Bacterial isolates were identified using standard biochemical methods. Data analysis was performed using the Orange visual programming platform, implementing a modular ML pipeline that integrates data preprocessing, feature handling, model training, and performance evaluation within a no-code environment. The Naive Bayes model, shows potential for predicting E. coli infection in hospitalized patients. The model is intended to predict E. coli infection at the time of specimen collection, before culture results are finalized, depending on clinical data. However, further validation in larger, multi-center prospective cohorts is needed before clinical implementation.
Sequence-to-function neural networks learn cis-regulatory sequence rules driving many types of genomic data. Interpreting these models to relate the sequence rules to underlying biological processes remains challenging, especially for complex genomic readouts such as MNase-seq, which maps nucleosome occupancy but is confounded by experimental bias. Here, we introduce pairwise influence by sequence attribution (PISA), which uses attribution to combinatorially decode which bases contributed to the readout at a specific genomic coordinate. PISA visualizes the effects of transcription factor motifs, detects undiscovered motifs with complex contribution patterns, and reveals experimental biases. By learning the bias for MNase-seq, PISA enables unprecedented nucleosome prediction models. These models allow the de novo discovery of nucleosome-positioning motifs and reveal the basis of Micro-C chromatin domain boundaries through systematic motif perturbations. Finally, these models allow the design of sequences with altered nucleosome configurations. These results show that PISA is a versatile tool that expands our ability to train and interpret sequence-to-function neural networks on genomics data and understand the underlying cis-regulatory code.
Type 1 diabetes (T1D) involves long-term health risks and challenges in individualizing therapeutic strategies. Meeting glycemic targets is a reliable indicator of effective diabetes management and positive prognosis. This study develops a clinically interpretable predictive model of 1-year glycemic control-defined as a binary outcome based on HbA1c values-using Real-World Data from 8999 T1D patients. A 1-year horizon is clinically meaningful, as annual reassessment aligns with standard care guidelines and supports timely treatment adjustments and complication screening. Deep Learning techniques are evaluated for discrimination and calibration. Various feature subsets, calibration methodologies, and sampling strategies for unbalanced outcomes are compared. The best-performing model includes 12 features, encompassing socio-demographics, clinical variables, associated complications, and pharmacological treatment. The scaling-binning calibration technique achieved the best calibration performance. The final model yielded an area under the receiver operating characteristic curve of 0.870, an F1-score of 0.789, and calibration errors between 0.014 and 0.038. Sampling techniques did not outperform unbalanced models followed by calibration. To enhance interpretability, a graphical representation quantifies the contribution of each variable to the patient's risk score. Combining strong predictive accuracy, calibration, and interpretability, the model may help clinicians make individualized decisions, intensify care for high-risk patients, and optimize healthcare resource allocation.
Pupil diameter is a non-invasive biomarker of brain state, correlating with arousal, attention, cognitive processing, and consciousness. However, existing pupillometry software often lacks scalability and robustness across diverse experimental conditions and species. We introduce Pupil-DLC, an open-source, offline, DeepLabCut-based pipeline for scalable, marker-less pupil tracking, primarily designed for mice. Trained on 21,750 manually annotated frames from over 140 videos of head-fixed mice spanning wakefulness and drug-induced states, including psychedelics and anesthesia, the dataset was deliberately selected to maximize pupil size variability and model generalization. Pupil-DLC implements a dual-model architecture: a General Model (GM) for high-throughput analysis and an Individual Model (IM) for session-specific optimization. Pupil-DLC captures pupil dynamics across awake, psychedelic, and anesthetized conditions with high agreement with ground truth and equal tracking fidelity during active locomotion and quiet rest. Confidence metrics aligned with human frame quality assessments, enabling principled tuning of accuracy-retention trade-offs. As a secondary demonstration, Pupil-DLC extends to unseen human videos across diverse conditions and frame rates, including daylight and smartphone recordings, without retraining. Pupil-DLC outperforms existing automated methods in accuracy and frame retention while maintaining computational efficiency comparable to real-time tools. These improvements stem from a learned keypoint-based representation robust to pupil shape variability, occlusions, reflections, and imaging artifacts. The GM/IM framework supports a tiered strategy balancing throughput and precision. Pupil-DLC provides a reproducible, adaptable platform for quantifying pupil-linked brain state dynamics across experimental paradigms and species, bridging basic mouse neuroscience and translational human applications.