Single-molecule assays like NOMe-seq, dSMF, and Nanopore are superior to DNase-seq and ATAC-seq as they do not destroy DNA. Thus, they enable quantification of all three, that is, protein-free, Transcription Factor-bound, and histone-complex-bound states. But a user-friendly tool to visualize and quantify such states is lacking. Here, we present SMTrackR, an R/Bioconductor package to visualize protein-DNA binding states on individual sequenced DNA molecules. SMTrackR queries the single-molecule footprint database we built and hosted at Galaxy Server. It comprises BigBed files generated from NOMe-seq, dSMF, and Nanopore (SMAC-seq) datasets. SMTrackR exploits UCSC REST API to query a BigBed file and plot footprint heatmap categorized in different binding states, as well as report their occupancies. Additionally, this package generates a Gviz-enabled script to visualize these single molecules on gene tracks. The SMTrackR tool is implemented in the statistical programming language R and is available as a Bioconductor package, SMTrackR (https://bioconductor.org/packages/3.23/bioc/html/SMTrackR.html). The GitHub repository at https://github.com/satyanarayan-rao/SMTrackR has latest updates. The installation time is less than five minutes given the dependent packages are installed. The tool is also available as a web version https://smtrackrest.iitr.ac.in/. A function is provided to use local BigBed file for users who wish to use unpublished data. A fully automated pipeline to generate such BigBed files is available at https://github.com/satyanarayan-rao/SMF_for_SMThub, and https://github.com/satyanarayan-rao/dSMF_for_SMThub.
Conflicting phylogenetic signals are common in plant phylogenomics and often reflect evolutionary histories shaped by processes like hybridization, incomplete lineage sorting, and whole-genome duplication (WGD). We aimed to identify and assess these complex processes in the hyper-diverse family Asteraceae to offer insight into the underlying causes of phylogenetic discordance. We used new and existing Hyb-Seq and transcriptome data to explore phylogenetic discordance by testing for nuclear/plastid incongruences, WGD, and reticulation. We present a tutorial detailing the execution of complex bioinformatic analyses to increase transparency, facilitate reproducibility, and support advancements in the field of plant evolution (https://github.com/erika-r-moore/Ellestad_etal_2025_APPS_Hybridizations). We uncovered extensive discordance among nuclear gene trees and deep reticulation events, particularly among South American lineages. Signals of WGD were found across the family but were often difficult to interpret, likely due to variation in data completeness, the complexity of the events, and their ancient origins. Our study and tutorial, along with a growing body of phylogenomic research, emphasize the role of reticulation and WGD in the evolution of large, diverse clades, while also underscoring the challenges. We anticipate continued advancements in theoretical approaches that will further enhance empirical studies in reticulate evolution. Las señales filogenéticas conflictivas son comunes en la filogenómica de plantas, y a menudo, reflejan historias evolutivas moldeadas por procesos como la hibridación, la clasificación incompleta de linajes y la duplicación del genoma completo. Nuestro objetivo fue identificar y evaluar estos procesos complejos en la diversa familia Asteraceae, para ofrecer una perspectiva sobre las causas subyacentes de la discordancia filogenética. Utilizamos datos nuevos y existentes de Hyb‐Seq y transcriptomas para explorar la discordancia filogenética mediante pruebas de incongruencia entre los genomas nucleares y plastidiales, duplicación completa del genoma (WGD) y reticulación. Presentamos un tutorial que detalla la ejecución de análisis bioinformáticos complejos para aumentar la transparencia, facilitar la reproducibilidad y apoyar los avances en el campo de la evolución de las plantas (https://github.com/erika-r-moore/Ellestad_etal_2025_APPS_Hybridizations). Descubrimos una discordancia extensa entre los árboles genéticos nucleares y eventos profundos de reticulación, particularmente entre linajes sudamericanos. Se detectaron señales de WGD en toda la familia, aunque a menudo resultaron difíciles de interpretar, probablemente debido a la variación en la integridad de los datos, la complejidad de los eventos y su origen antiguo. Nuestro estudio y tutorial, junto con un cuerpo creciente de investigaciones filogenómicas, destacan el papel de la reticulación y de los WGD en la evolución de clados grandes y diversos, al mismo tiempo que subrayan los desafíos asociados. Anticipamos avances continuos en enfoques teóricos que potenciarán aún más los estudios empíricos sobre la evolución reticulada.
Protein dynamics are central to function, but experiments and molecular dynamics (MD) simulations remain costly, low-throughput, and difficult to compare across protocols. Scalable structure-based methods are needed to infer dynamics from static protein structures. We present a deep learning framework that predicts protein dynamics from 30-dimensional Gaussian integral (GI) descriptors of Cα backbone topology. Using 1,374 ATLAS protein chains with MD-derived RMSF, GI stratified proteins into fold-relevant clusters enriched for secondary structure, sequence homology, and ECOD families. An attention-based 1D-CNN classified flexible versus non-flexible proteins with test AUC = 0.772 and separated slow-mode- from fast-mode-dominated dynamics with AUC = 0.91. Regression models recovered mean RMSF (Pearson r = 0.72; R² = 0.46) and slow-mode RMSF more accurately (Pearson r = 0.83; R² = 0.62), supporting rapid inference of flexibility and collective-motion bias. Code and data are available on GitHub at: https://github.com/fvilicich/gaussian_integral/blob/main/gaussian_integral_classification.ipynb. Supplementary data are available at Bioinformatics online.
Rare diseases (RDs) are a highly heterogeneous and underserved group of conditions. Most RDs have a strong genetic basis but their causal pathophysiological mechanisms remain poorly understood, limiting the development of targeted therapies. We systematically characterised the cell type-specific mechanisms underlying all genetically defined RD phenotypes by integrating the Human Phenotype Ontology (HPO) with whole-body single-cell transcriptomic atlases from embryonic, foetal, and adult samples. Associations were validated against orthogonal biomedical knowledge graphs and then prioritised by strength of supporting evidence, clinical severity, and gene-therapy compatibility. We identified significant associations between 201 cell types and 9,575/11,028 (86.7%) phenotypes across 8,628 RDs, substantially expanding knowledge of phenotype-cell type links. Prioritisation by severity (e.g. lethality, motor or mental impairment) and gene-therapy compatibility (e.g. cell type specificity, postnatal treatability) identified candidate phenotypes and cell types for therapeutic targeting. We present a scalable, reproducible framework for phenome-wide, cell type-specific mechanism prediction in rare diseases, providing a major step toward systematic therapeutic development for patients across a broad spectrum of serious RDs. Interactive web portal: https://neurogenomics-ukdri.dsi.ic.ac.uk/. R packages introduced in this study: KGExplorer (https://github.com/neurogenomics/KGExplorer), HPOExplorer (https://github.com/neurogenomics/HPOExplorer), and MSTExplorer (https://github.com/neurogenomics/MSTExplorer). Manuscript analyses and reproducibility code: https://github.com/neurogenomics/rare_disease_celltyping.
Accurate segmentation of multiple organs is essential for the diagnosis and treatment of head and neck cancer. However, the intricate anatomical structure and dense organ distribution in the head and neck region pose significant challenges for existing automated segmentation models, which predominantly target single organs and rely on single-modality imaging. Achieving comprehensive, one-step segmentation of organs-at-risk (OARs) remains challenging. To this end, we propose a Point-cloud Matrix Fusion-based Segmentation Model (PMFM) that leverages an improved multi-modal data fusion strategy for the automated full segmentation of OARs in head and neck cancer. The proposed PMFM involves three core modules: 1) a camera model-based 3D feature mapping and point-cloud extraction module (PEM) that enables vertical decoupling of modalities and objects; (2) a Point Cloud Matrix Module (PMM) utilizing PointNet and a virtual point cloud-based attention mechanism to facilitate horizontal association and global feature learning across modalities; and (3) a Cross Fusion Module (CFM) based on virtual point clouds to achieve deep intermodal object fusion and enhance inter-organ correlation. PMFM effectively integrates multi-modal image information, transforming them into a unified virtual point cloud matrix, and enables precise, comprehensive segmentation of OARs in head and neck cancer. Extensive validation and comparative experiments on the HaNSeg dataset demonstrate that PMFM significantly outperforms state-of-the-art methods, achieving an average Dice coefficient of 79.8% and an average Hausdorff distance of 2.47 mm. The source code for this study will be publicly available on GitHub at https://github.com/zhouxinyu1028/PMFM.
Weighted Quantile Sum (WQS) regression is a statistical method for quantifying the association between multiple possibly correlated predictors and a health outcome, estimating both the joint effect of the predictors as well as their individual contributions to the total effect. WQS has become one of the most popular and widely used approaches for investigating complex mixtures in environmental epidemiology, yet its implementation has been largely restricted to R users. In this paper we present wqsreg, the first Stata command for WQS regression, implemented for continuous, binary and count outcomes. We describe command's architecture and present an application of the command on exposome data exploring the association between 38 exposures and a continuous outcome. Wqsreg provides a user-friendly command for WQS regression that integrates several flexible components of the framework such as bootstrap, training/validation splitting, and repeated holdout procedures. Wqsreg returns regression estimates as well as graphical displays of the individual weights. It requires Stata version 11 or higher and is freely available on GitHub [ https://github.com/PonzanoMarta/wqsreg ]. Given the increasing importance of appropriately exploring complex multidimensional exposures, this contribution will further promote the use of appropriate statistical methods in epidemiological settings with multiple correlated predictors.
Virtual coronary intervention planning (VCIP) aims to optimize the hemodynamic outcomes of percutaneous coronary intervention (PCI) in patients with coronary stenosis. However, its clinical adoption remains constrained by the computational burden associated with evaluating numerous combinatorial intervention strategies, leading to time-consuming workflows and potentially suboptimal decisions in the catheterization laboratory. While conventional deep reinforcement learning (DRL) offers a path to automated VCIP, it often explores state-action-reward space inefficiently. In this study, we propose an Informed-Exploration Reinforcement Learning framework that concentrates the search on clinically meaningful interventions by integrating historical intervention experience with patient-specific anatomical and physiological information to guide the generation of functionally informed stent strategies. Extensive experiments on 172 vessels from 146 patients show that IERL achieves high agreement (r = 0.815) with real interventions and excellent computational efficiency with an average run time of 2.1 seconds. By aligning exploration with both prior experience and patient context, IERL provides objective, reproducible, and near-real-time VCIP decision support, enabling timely and interpretable recommendations compatible with catheterization workflows. The code and models are available at: https://github.com/HIC-SYSU/IERL/tree/main.
The increasing volume of time series signals and the scarcity of labels make time series anomaly detection (TSAD) a natural fit for self-supervised deep learning. However, existing normality-based approaches face two key limitations: relying on a single assumption often fails to capture the whole normality, leading to biased representations; and they typically presume clean training data, which is unrealistic in practice and undermines model robustness. In this article, we propose RoCA, a unified and robust anomaly detection (AD) framework that simultaneously addresses assumption incompleteness and data contamination. The key insight is that normal samples tend to satisfy multiple normality assumptions, whereas anomalous or contaminated samples should violate at least one. RoCA employs a composite loss function consisting of a multinormality alignment term, a dynamic anomaly-aware term, and a variance term to maintain training stability. This design enables RoCA to dynamically discover and push away latent anomalies during training to refine the boundary, eliminating the dependence on precisely labeled, high-purity training data. Extensive experiments on both univariate and multivariate time series datasets demonstrate that RoCA consistently outperforms state-of-the-art methods, achieving up to 7.3% improvement under real-world contamination. Our theoretical analysis further reveals the intrinsic synergy between contrastive learning (CL) and one-class classification (OC) under the RoCA framework. The source code is available at the anonymous repository https://github.com/ruiking04/RoCA.
Detecting small objects in aerial images is significantly challenging due to their nonuniform distribution and severe scale variations resulting from changing view angles. Because autonomous aerial vehicles have limited computational power, balancing detection accuracy and efficiency remains a challenging problem. Existing methods, e.g., feature pyramid network (FPN)-based algorithms, concentrate on fusing deep low-resolution features with shallow high-resolution features and primarily rely on simple stacking and channel fusion. However, features of small objects are easily affected by unpredictable noise from the background, leading to high computation cost during feature fusing. This work tackles the issue by designing a novel one-step generative small object detection (SOD) framework. It leverages the self-consistency property provided by a consistency model, which enables the proposed model to convert random Gaussian noise to a single-scale output, thereby enabling the "one-step" inference. We formulate an SOD task as a noise-to-box procedure. We then apply a consistency model to initialize the diffusion process with Gaussian noisy bounding boxes derived from their corresponding ground-truth (GT) annotations. We next introduce a denoising sampling strategy to classify and locate small objects by iteratively refining their Gaussian distributions. We finally comprehensively evaluate our proposed framework on several SOD benchmarks for autonomous aerial vehicles, including DOTA, VisDrone, and AAVDT. Experimental results corroborate that it outperforms the state-of-the-art method (DiffusionDet) by up to 5.1% in terms of $AP_{S}$ (average precision on small objects) on DOTA. Code is available at https://github.com/BrainPotter/CEOSOD.
Spatial sequencing technologies enable the single-cell-level study of molecular organization in tissues. Revealing such spatial patterns relies on accurate cell segmentation. In complex tissues with dense cell packing, segmentation based solely on nuclear staining is insufficient for accurate cell boundary detection. This limitation arises because accurate segmentation necessitates the delineation of cell morphology, which is driven by molecular activities such as cytoskeletal dynamics, cell-cell adhesion, and intercellular signaling. Thus, integrating molecular information, including gene or protein expression, has the potential to improve segmentation, but remains computationally challenging. To address this, we developed SegJointGene, a deep learning framework that jointly performs cell segmentation and spatial gene prioritization by integrating nuclei-based images with spatial gene or protein expression data. SegJointGene designs an information-entropy-guided convolutional neural network together with a computational information discarding score to identify genes that are important for cell-type-specific segmentation. The model iteratively refines gene prioritization and cell boundaries, producing convergent segmentation results along with prioritized spatial genes or proteins across cell types. We applied and benchmarked SegJointGene on both simulation and real spatial datasets, including spatial transcriptomics from the mouse hippocampus and distinct regions of the whole mouse brain, as well as spatial proteomics data from human tonsil. Across datasets, SegJointGene outperformed existing methods by 5-20% in accurately assigning molecular signals to cell boundaries. Robustness analyses further demonstrated stable performance across varying gene numbers and imaging resolutions. In addition, the genes prioritized by SegJointGene were enriched for structural, developmental, and synaptic signaling pathways, supporting their relevance to spatial tissue organization. The source code and data are available at https://github.com/daifengwanglab/segjointgene. Supplementary figures, notes and data descriptions are available in Supplementalmaterials.pdf.
Major advances in Plasmodium sequencing approaches, bioinformatic pipelines, and data analysis tools have provided valuable insights into malaria epidemiology from parasite genomic data. However, translating genetic data into actionable information for decision-makers remains a challenge. Significant barriers limit the integration of these advances into a functional data analysis ecosystem that produces standardized, interpretable results for use by national malaria control programs. The Plasmodium Genomic Epidemiology network convened 18 subject matter experts across 15 institutions at the Reproducibility, Accessibility, Documentation, and Interoperability Standards Hackathon in 2023 to identify available analysis tools, evaluate software standards, improve documentation, and outline workflows. Eight use cases for genomic data were identified, and a subset was developed into analysis workflows comprising a series of connected functionalities. Software tools were then mapped against functionalities to outline a modular approach to data analysis for these use cases. In addition to outlining workflows, a set of objective criteria was developed for evaluating software standards. A total of 40 Plasmodium genomic analysis tools were identified, 22 of which were prioritized for software standards evaluation. Additional tutorials were developed for 10 tools in the form of reproducible code applied to shared datasets. These resources are available on PGEforge (mrc-ide.github.io/PGEforge), a new community resource that serves as a central, open repository for current and future resources for malaria genomic data analysis.
Background/Objectives: In gas chromatography-mass spectrometry (GC-MS) library-based compound identification, spectrum preprocessing and associated tuning parameters critically influence identification performance. These parameters are conventionally optimized using grid search, which requires predefined parameter spaces and becomes computationally inefficient as dimensionality increases, often failing to identify optimal values because of discretization. Differential evolution (DE), a population-based metaheuristic optimization algorithm, provides a flexible alternative through efficient global exploration of the parameter space. This study compared the performance of DE and grid search for optimizing compound identification. Methods: Cosine similarity was applied to the NIST GC-MS library. DE was used to maximize either cross-validated accuracy or mean reciprocal rank (MRR). Results were compared with those from a grid search over five equally spaced parameter values. Identification performance was evaluated using accuracy, MRR, and area under the receiver operating characteristic curve (AUC). Results: When all four parameters were optimized simultaneously, DE achieved slightly higher cross-validated accuracy and MRR than grid search, although the absolute differences were modest. More pronounced differences were observed in specific unidimensional tuning scenarios, particularly for the intensity weight factor. Simultaneous multidimensional parameter optimization yielded better performance than isolated parameter tuning. Conclusions: Grid search may be computationally advantageous when the parameter space is known and limited, whereas DE provides a more flexible approach for unknown or high-dimensional search spaces. Overall, DE achieved comparable identification performance to grid search, with modest improvements observed in some optimization settings. A command line Julia-based tool, MSTune, was developed for spectrum preprocessing parameter optimization and is publicly available on GitHub.
Population graph-based Graph Neural Networks (GNNs) have demonstrated superior performance in brain disease diagnosis by modeling inter-subject relationships. However, most existing approaches rely on a transductive setting that achieves high performance on known subjects but suffers from significant performance degradation when applied to unseen subjects. While a few inductive population graph models have been proposed, they struggle with single-subject inference, either due to a reliance on test batches for graph construction or limited generalization capabilities for individual unseen nodes. In this paper, we propose a fully inductive inference protocol in population graphs designed for single-subject diagnosis. Our approach constructs a population graph exclusively with training nodes and dynamically establishes connections between a single unseen test subject and the training graph during the inference phase based on imaging and phenotypic similarities. We conducted extensive experiments on multiple neuroimaging datasets (ABIDE I, ABIDE II, and ADHD-200) to evaluate the proposed pipeline. The results demonstrate that our method outperforms both state-of-the-art transductive models and existing inductive baselines under a fully inductive evaluation protocol. Furthermore, our analysis reveals that single-subject inference tends to maximize diagnostic performance within our experimental settings by reducing potential interference between test subjects. Importantly, our approach obviates the prohibitive retraining bottleneck typically required by transductive models, thereby providing an operational advantage for deployment and facilitating efficient, real-time single-subject inference workflows. The source code is available at https://github.com/98jaemin/single_subject_popgnn.
Out-of-distribution (OOD) generalization has emerged as a critical challenge in graph learning, as real-world graph data often exhibit diverse and shifting environments that traditional models fail to generalize across. A promising solution to address this issue is graph invariant learning (GIL), which aims to learn invariant representations by disentangling label-correlated invariant subgraphs from environment-specific subgraphs. However, existing GIL methods face two major challenges: (1) the difficulty of capturing and modeling diverse environments in graph data, and (2) the semantic cliff, where in variant subgraphs from different classes are difficult to distinguish, leading to poor class separability and increased misclassifications. To tackle these challenges, we propose a novel method termed Multi-Prototype Hyperspherical Invariant Learning (MPHIL), which introduces two key innovations: (1) hyperspherical in variant representation extraction, enabling robust and highly discriminative hyperspherical invariant feature extraction, and (2) multi-prototype hyperspherical classification, which employs class prototypes as intermediate variables to eliminate the need for explicit environment modeling in GIL and mitigate the semantic cliff issue. Derived from the theoretical framework of GIL, we introduce two novel objective functions: the invariant prototype matching loss to ensure samples are matched to the correct class prototypes, and the prototype separation loss to increase the distinction between prototypes of different classes in the hyperspherical space. Extensive experiments on 13 OOD generalization benchmark datasets demonstrate that MPHIL achieves state-of-the-art performance, significantly outperforming existing methods across graph data from various domains and with different distribution shifts. The source code of MPHIL is available at https://github.com/se7esx/MPHIL.
Reconstructing gene regulatory networks (GRNs) with directionality and regulatory types is an important challenge in computational biology. Existing methods often struggle to effectively capture complex topological structures in highly skewed GRNs due to imbalances between local and global information and to the collapse of representation dimensionality. To address these challenges, we propose BMGRN, a unified framework that reconstructs directional and GRNs with regulation types by integrating bidirectional state space modeling with dual contrastive representation learning. Drawing inspiration from sequence modeling, BMGRN employs an enhanced bidirectional Mamba2 architecture to capture long-range dependencies and asymmetric regulatory interactions between genes efficiently. This design enables global information propagation while maintaining directional specificity. Furthermore, a dual contrastive learning mechanism is introduced to alleviate oversmoothing and dimensional collapse, enforcing representation uniformity and discriminability in low-connectivity scenarios. By coupling these representations with a KAN-based convolutional predictor, BMGRN adaptively learns nonlinear dependencies and regulatory modes, thereby improving its modeling capacity for the GRN inference. Experiments on multiple benchmark data sets show that BMGRN attains superior performance, demonstrating great potential for large-scale GRN inference. The code is available at https://github.com/KanZh/BMGRN.
A critical part of omics analysis is the transition from early data exploration to final interpretation, often including different analytical platforms and the proliferation of figures, tables, and files. To minimize potential errors and delays that can occur during this process, we have developed an R package called "Hotgenes" that contains a wide range of flexible utilities available in a single modular Shiny application. With Hotgenes, differential expression results generated from bulk omics platforms can be imported and shared among collaborators with minimal coding. Furthermore, the modular Hotgenes user interface can be customized by advanced users to fit the needs of their evolving pipelines. Hotgenes is implemented in R and is freely available at https://github.com/pfizer-opensource/Open-Hotgenes. A permanent archived version is available at https://doi.org/10.5281/zenodo.20129460. Supplementary data are available at Bioinformatics online.
Complex tubular structures are essential in medical imaging and computer-assisted diagnosis, where their integrity enhances anatomical visualization and lesion detection. However, existing segmentation algorithms struggle with structural discontinuities, particularly in severe clinical cases such as coronary artery stenosis and vessel occlusions, which leads to undesired discontinuity and compromising downstream diagnostic accuracy. Therefore, it is imperative to reconnect discontinuous structures to ensure their completeness. In this study, we explore the tubular structure completion based on point cloud for the first time and establish a Point Cloud-based Coronary Artery Completion (PC-CAC) dataset, which is derived from real clinical data. This dataset provides a novel benchmark for tubular structure completion. Additionally, we propose TSRNet, a Tubular Structure Reconnection Network that integrates a detail-preservated feature extractor, a multiple dense refinement strategy, and a global-to-local loss function to ensure accurate reconnection while maintaining structural integrity. Comprehensive experiments on our PC-CAC and two additional public datasets (PC-ImageCAS and PC-PTR) demonstrate that our method consistently outperforms state-of-the-art approaches across multiple evaluation metrics, setting a new benchmark for point cloud-based tubular structure reconstruction. Our benchmark is available at https://github.com/YaoleiQi/PCCAC.
Hybrid capture sequencing (Hyb-Seq) is a widely used approach in phylogenomics, providing efficient access to targeted genomic regions. However, deriving high-quality phylogenetic trees from raw sequencing reads requires extensive bioinformatics processing, which increases complexity, the risk of errors, and challenges in file management, especially for users unfamiliar with bioinformatics workflows. We developed HybSuite, a streamlined Bash-based bioinformatics pipeline built upon mainstream tools such as HybPiper 2, designed to simplify the Hyb-Seq phylogenomic analysis from raw reads to species trees. Compared to existing tools (e.g., HybPiper 2, CAPTUS), it offers a modular yet integrated workflow covering all key steps from downloading from the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA), adapter removal, data assembly, and paralog handling to species tree inference and extensive in-depth analysis. We validated HybSuite by reconstructing a robust phylogeny for the Elaeagnaceae family, using the Angiosperms353 probe set and a dataset of 100 single-copy nuclear loci from Arabidopsis. HybSuite provides a flexible and user-friendly pipeline for Hyb-Seq phylogenomic analyses, and its high accuracy and efficiency were demonstrated through benchmarking with two empirical datasets. HybSuite is freely available at https://github.com/Yuxuanliu-HZAU/HybSuite. The pipeline is compatible with both the Linux and MacOS platforms.
Despite the increased availability of electronic health records, open-source standardized data collection to facilitate high-resolution data during extracorporeal life support (ECLS) is lacking. This project aimed to assess the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) for interoperability to store data sufficiently generated in the context of ECLS and to develop a custom data model expansion in case the OMOP CDM proved insufficient. The OMOP CDM was analyzed qualitatively by expert consensus for its capability to capture data relative to ECLS as well as the presence of fitting ECLS-related concepts. Database entries necessary to store information about primary ECLS components were compared using the OMOP CDM versus the custom data model expansion. Analysis of data elements required to capture ECLS data within the OMOP CDM revealed a paucity of suitable concepts within the OHDSI Standardized Vocabularies, limiting capture of ECLS circuit-derived data. Custom ECLS-specific database tables and novel concepts were introduced as part of a custom expansion, the ECLS Common Data Model (ECLS CDM). The number of database entries necessary to store ECLS use cases was reduced by up to 90%. The ECLS CDM was released as an open-source project on GitHub and placed in the public domain. With the first iteration of the ECLS CDM, we introduce a data model to improve interoperability for data describing ECLS and elevate data quality, enabling multi-center research and quality initiatives.
Real-time Detection Transformer (RT-DETR) exhibits notable advantages in real-time object detection, as evidenced by its enhanced speed and accuracy. However, the performance of RT-DETR is still constrained, due to (a) limited channel-wise correlation of attention mechanism that fails to capture cross-channel feature interactions, (b) information loss caused by dropout that may affect small and overlapped targets, and (c) insufficient robustness and limited diversity of small-scale training set. To solve these problems, this study proposes a real-time Detection Transformer with Encover and Soft-Dropout (ES-DETR). ES-DETR consists of three parts: (a) a spatial attention oriented module called Encover is introduced to replace the traditional flattened attention, and learns global spatial features by capturing cross-view knowledge from images; (b) soft dropout (SD) that replaces traditional dropout with a predetermined number of features being suppressed by Gaussian distribution, and performs stable feature-wise dropout to increase the robustness of the detector; and (c) Grid Noise Augmentation (GNA) that divides the image into grid-like patterns, and pile multiple Gaussian masks on the image to mitigate real-world disturbances. A series of experiments conducted on several datasets show that ES-DETR achieves significant improvement and excels in most of object detection tasks. Source code and pretrained models are available at https://github.com/he13689/ES-DETR.