Early and accurate detection of brain tumors is clinically valuable for improving prognosis and guiding treatment. Existing deep-learning methods for magnetic resonance imaging (MRI) brain tumor detection face three difficulties: weak texture at lesion boundaries impairs localization; heterogeneous lesion scales degrade multi-scale detection; and non-maximum suppression (NMS) post-processing limits end-to-end inference. We propose a topology-decoupled end-to-end detection framework based on boundary-preserving feature flow and inter-channel correlation (ICC) distillation. A high-capacity teacher combines a multi-gradient-flow backbone with a gather-and-distribute global fusion mechanism, capturing both pathological boundary textures and anatomical context; a lightweight student is then derived by removing the global-fusion neck while retaining the isomorphic backbone. After comparing five feature distillation methods, we adopt ICC distillation, which aligns Gram matrices of intermediate features and mitigates the background-dominated bias common in medical imaging. Across three random seeds, the ICC-distilled student reaches mAP@0.5 = [Formula: see text], surpassing the plain student ([Formula: see text]) and matching or exceeding the teacher ([Formula: see text]). On the BraTS small-lesion stratum it attains 98.7% lesion recall with a false-positive-per-image rate of 0.014. The student achieves this at low cost (6.09M parameters, 11.7 GFLOPs, 168 FPS), suiting resource-constrained clinical deployment.
Early and reliable crack localization in jet turbine blades is important for structural health monitoring in aerospace systems. This study presents an uncertainty-aware Deep Kernel Learning (DKL) framework that integrates a deep residual feature extractor with a Sparse Variational Gaussian Process (SVGP) model for crack location regression from modal frequency data. The model is trained end-to-end, enabling the learned latent representation to be optimized jointly with the Gaussian Process objective while providing predictive uncertainty estimates. A finite element simulation dataset consisting of 901 samples was generated by parametrically varying crack positions along the turbine blade and extracting modal frequency features. To ensure a comprehensive evaluation, robustness analyses were conducted under both Gaussian noise (σ = 0.01, 0.05, 0.10) and synthetic dataset shifts (± 5% and ± 10%). In addition, a more realistic evaluation was performed using non-uniform stochastic perturbations that simulate feature-wise variability, sensor drift, and environmental noise. The proposed DKL model was compared with classical machine learning methods (Random Forest, SVR, XGBoost) as well as modern uncertainty-aware deep learning approaches, including MC Dropout and Deep Ensemble. Results show that all models achieve high predictive accuracy under clean conditions due to the smooth and deterministic nature of the simulation dataset. Under stochastic perturbations, Deep Ensemble achieves the lowest prediction error, while the proposed DKL model maintains competitive accuracy and provides more conservative and stable uncertainty estimates. These findings indicate that the proposed DKL framework offers a balanced trade-off between predictive performance and uncertainty awareness. However, further validation on real-world sensor data is required to assess its applicability in practical structural health monitoring scenarios.
Phylogenetic reconstruction is a multi-step process that typically involves sequence retrieval, alignment, trimming, and tree inference, often requiring the integration of multiple independent tools. This fragmented workflow increases technical complexity and limits reproducibility, particularly in large-scale analyses. Here, we present phyloPipeR, an R package that provides an integrated and automated framework for end-to-end phylogenetic analysis and tree comparison within a unified environment. The phyloPipeR enables complete workflows from ortholog retrieval to tree inference and quantitative comparison, while also supporting modular execution of individual steps. The package implements multiple phylogenetic inference methods and supports both concatenation and coalescent strategies for multi-gene analyses. By integrating tree reconstruction and quantitative comparison within a single framework, phyloPipeR improves reproducibility, reduces technical barriers, and provides a scalable solution for systematic and integrative evolutionary studies.
Background: Metabolic dysfunction-associated steatotic liver disease (MASLD) is a significant and escalating global health concern, with an estimated prevalence of 30%. Current assessments of hepatic steatosis, a hallmark of MASLD, rely on semi-quantitative grading by pathologists, which is inherently limited by inter-observer variability. Objective: To address this limitation, we developed a novel deep learning pipeline, named SteatoStat, to standardize and enhance the quantification of hepatic steatosis in patients with MASLD. Method: The SteatoStat pipeline employs and integrates multiple components such as file format standardization, rule-based cell filtering, and multiple segmentation models across various liver structures, resulting in an output of a continuous quantitative measure of steatosis percentage and translated into steatosis grades. Results: We report a high degree of accuracy and reliability with SteatoStat achieving the following performance metrics (DICE score = 0.8955, AUROC = 0.9928, F1 score = 0.8990). When benchmarked against expert pathologists, the weighted Kappa coefficient is 0.837. Furthermore, in comparison with an existing, well-established model, SteatoStat demonstrated a weighted Kappa coefficient = 0.765. Conclusions: These robust findings underscore its potential clinical utility in providing a standardized objective quantification of hepatic steatosis. Future directions include enhancing the model's generalizability and its clinical integration through validation on independent, multi-institutional datasets.
Cancer is a complex and heterogeneous disease that is characterized by multi-level biological variability. Advances in high-throughput technologies have led to large-scale, high-dimensional data sets in cancer research, creating a pressing need for powerful computational techniques for successful data analysis. Current techniques may be inadequate for this purpose, thus underscoring the potential of artificial intelligence (AI) and machine learning (ML) for successful data analysis. This review provides a comprehensive pipeline for artificial intelligence/machine learning in cancer research, including preclinical research, clinical decision support, and real-world implementation. It emphasizes several important technologies, data integration, and implementation challenges. The review critically examines multi-omics fusion architectures, regularization-based machine learning, batch-effect harmonization, explainable AI, and federated learning, while addressing translational barriers including algorithmic bias, covariate drift, and regulatory asynchrony across Indian, US, and EU frameworks. Anchored by Decision Curve Analysis as a clinical utility benchmark, this narrative framework establishes that meaningful progress in precision oncology, early detection, and patient outcomes demands not only predictive accuracy but also externally validated, population-representative, and governance-compliant AI systems capable of sustained real-world oncology impact.
Background/Objectives: Spontaneous intracranial hypotension (SIH) is caused by spinal cerebrospinal fluid (CSF) leakage and is typically diagnosed by clinical presentation and characteristic MRI signs; however, objective tools for monitoring physiological changes and treatment response remain limited. Cine phase-contrast MRI (PC-MRI) enables noninvasive quantification of aqueductal CSF dynamics, yet reliable analysis is challenging since the cerebral aqueduct is extremely small and susceptible to low contrast, partial volume effects, and ROI-dependent measurement variability-particularly in SIH where CSF pulsatility is often reduced. Methods: We propose an end-to-end automated framework that integrates (1) a cascade localization-segmentation strategy, consisting of Tiny YOLOv4 detection followed by MultiResUNet segmentation on a YOLOv4-derived cropped ROI; (2) physiology-informed pulsatility-based segmentation (PUBS) to refine anatomical masks into functional flow ROIs; and (3) one-dimensional convolutional neural networks (1D-CNNs) to extract exploratory waveform morphology features from 32-phase cardiac-cycle velocity waveforms. The study includes 39 participants, yielding 59 cine PC-MRI examinations: 11 controls, 28 Pre-treatment SIH scans and 20 Post-treatment Recovery scans. Results: The cascade model significantly improves segmentation robustness compared with a full-image baseline, achieving higher Dice scores and markedly lower boundary errors across cohorts (e.g., Pre-treatment SIH HD95: 1.66 ± 0.74 px vs. 15.37 ± 44.98 px). PUBS refinement reduces quantification deviation from expert manual references in SIH (mean relative error: 7.4% to 5.6%) and improves diagnostic performance for multiple hemodynamic parameters (e.g., downward mean flow AUC: 0.747 to 0.792). For waveform morphology analysis, the end-to-end 1D-CNN classifier was evaluated using repeated-seed participant-level grouped LOOCV. The repeated-seed ensemble prediction showed modest out-of-sample discrimination between Normal controls and Pre-treatment SIH scans, with an AUC of 0.646, a bootstrap 95% confidence interval of 0.455-0.826, and a permutation-test p-value of 0.072. Separately, exploratory analysis of the final baseline-trained 1D-CNN latent space showed marked, apparent Normal-versus-SIH separability and an intermediate recovery distribution in PCA space, suggesting that aqueductal waveform morphology may encode SIH-related physiological information. Conclusions: These findings suggest that SIH-related information may be reflected not only in flow magnitude but also in aqueductal CSF waveform morphology. However, the modest and statistically non-significant out-of-sample performance of the end-to-end 1D-CNN classifier indicates that morphology-based AI features should currently be regarded as exploratory biomarker candidates rather than validated stand-alone diagnostic tools. Larger independent cohorts are required to confirm their reproducibility, physiological meaning, and clinical utility.
Crack detection in mobile inspection scenarios is constrained by both the extremely slender geometry of crack targets and the real-time inference requirements on edge devices, which expose systematic limitations of general-purpose object detectors. This paper proposes YOLO-Crack, a closed-loop solution that couples geometry-statistics-driven module design with end-to-end edge deployment validation. On the algorithmic side, we first quantify crack geometric properties and then introduce (i) a crack-aware cross-dimensional fusion attention (CFCA) module to strengthen feature representations, (ii) a dual-path feature enhancement module (DFEM) to preserve fine details during upsampling, and (iii) an empirical smooth quality window adjustment with shape consistency regularization to stabilize bounding-box regression for slender cracks. Experiments on the Crack500 dataset show that YOLO-Crack achieves 78.8% precision, 51.4% recall, and 65.7% mAP@0.5, improving over the YOLOv11n baseline by 4.2, 1.7, and 2.9 percentage points, respectively. On the engineering side, we deploy YOLO-Crack on a Jetson Orin NX mobile robot platform and evaluate it in a real ROS pipeline; the measured end-to-end throughput reaches 25.5 FPS, meeting real-time video processing requirements. The proposed framework provides a practical reference workflow for edge vision tasks, from geometry analysis to engineering verification.
Extracting key information from vast amounts of documents and data plays a crucial role in knowledge graph construction, intelligence analysis, decision support, and multimodal information retrieval (such as speech sentiment analysis and invoice error detection). While end-to-end OCR-free methods avoid the error propagation issues of traditional two-stage models, they often struggle to balance the extraction of fine-grained character details with the modeling of complex global layouts. To address this, this paper proposes a novel hybrid encoder architecture that synergizes the inductive bias of Convolutional Neural Networks (CNNs) with the global context modeling of Swin Transformers. Unlike standard symmetric architectures, we introduce a geometry-aware asymmetric downsampling strategy: a ConvNext (CN) module first compresses the height to retain horizontal resolution for character distinction, followed by a Swin-T module that reduces width to capture long-range row-column dependencies. Experimental results on the CORD and IIT-CDIP datasets demonstrate that the proposed method outperforms other OCR-free end-to-end information extraction methods in terms of information extraction accuracy and shows potential in advancing intelligent operations and maintenance.
Many authors have investigated whether a new functional end-to-end anastomosis, the Kono-S, or the resection of the mesentery could allow better results following primary ileocolic-resection for Crohn Disease. The Resection of the mesentery vs Kono-S anastomosis in preventing surgical recurrence (Remeasure)-trial aims at investigating if Kono-S-anastomosis or resection of the mesentery with functional end-to-end anastomosis could impact results in terms of postoperative complications and surgical, endoscopic and clinical recurrence. Randomized prospective trial at a tertiary referral Institution. Primary endpoint: endoscopic recurrence at 6 months (Rutgeerts score i2 or greater). Secondary endpoints: postoperative complications, clinical recurrence after 12 months, endoscopic-recurrence after 18 months, surgical recurrence after 24 months. 73 patients randomly assigned: 36 to Kono-S and 37 to mesenteric-resection. The two groups had similar peri-operative course. Surgical-recurrence occurred in two patients (1 following Kono-S, 1 following mesentery-resection). Six-months endoscopic-recurrence occurred in 12/36 (33%) of Kono-S patients and in 12/37 (32%) of the mesenteric-resection group. The 6-12- and 18-month time-to-event estimates showed no significant differences in endoscopic or clinical recurrence. In the Cox proportional hazards model, perforating behavior was a risk factor for late endoscopic recurrence (HR 1.70, p = 0.05), whereas adjuvant biologic therapy was protective (HR 1.32, p = 0.034). The Remeasure trial doesn't show significant advantages in terms of postoperative complications, surgical, endoscopic and clinical recurrence after Kono-S anastomosis or mesenteric-resection technique following primary ileocolic-resection in Crohn Disease.
Background: Confident chemical annotation in nontarget small-molecule mass spectrometry critically depends on the availability of high-quality tandem mass spectral (MS2) reference libraries. While community efforts have driven significant expansion of open-access repositories, technical challenges in assembling standardized, metadata-rich records continue to limit broader participation, underscoring the need for improved computational tools to assist contributors. Methods: To promote the creation and sharing of standardized reference MS2 spectral records, we have developed Librarian, a free, open-access web application designed for rapid and scalable assembly of high-resolution MS2 libraries. Librarian integrates automated retrieval and harmonization of chemical identifiers and metadata from PubChem, compound mixture design for high-resolution mass spectrometry (HRMS) acquisition, and assembly of curated MS2 spectra into repository-ready records compatible with public spectral databases. Results: Through a simple in-browser interface, Librarian offers a flexible end-to-end workflow compatible with popular open-source pre-processing tools to lower technical barriers and facilitate broader community participation in library development. As a demonstration, we used Librarian to create and deposit a spectral library comprising over 1500 new MS2 records into MassBank, which was further applied in retrospective analysis of environmental datasets. Conclusions: Librarian streamlines the creation of standardized, metadata-rich and repository-ready MS2 reference records. Addressing a key bottleneck in community spectral library development and sharing, Librarian supports the continued growth of open-access resources for metabolomics, exposomics, and environmental mass spectrometry. The Librarian web application is publicly accessible via the SciLifeLab Serve platform.
Cameras and IMUs on heavy mining trucks supply the visual signal that Advanced Driver Assistance Systems (ADASs) use in open-pit operations. Haul roads in a surface mine are unstructured and unmarked, so a perception model must be both accurate and fast. We address this with a video-based multitask pipeline for a mining Driver Support System (DSS): a single BiSeNetV1 network produces drivable-area segmentation and steering-direction classification in one forward pass. Training used only 100 frames sampled non-sequentially from in-cab recordings of a real open-pit mine; evaluation used two full onboard sequences. To exploit temporal redundancy without annotating video, we propose an Adaptive Clockwork (A-CW) inference scheme: the spatial path runs on every frame, while the context path is refreshed only on keyframes whose cadence is set by the classification output, the same signal shown to the driver as a steering hint. This classification-guided policy increases context updates on curved segments, where the scene changes more rapidly, and reduces them on straight sections, where semantic redundancy is higher. The selected A-CW configuration was evaluated on full temporal test sequences, including one route kept entirely outside the training source. On this unseen route, A-CW achieved 94.70% road-class IoU and 73.68% Top-1 Accuracy. GPU-only throughput increased from about 55 FPS with frame-by-frame inference to 168.01 FPS, and display-excluded end-to-end processing in the simulated ADAS pipeline remained at approximately 37.5 FPS.
Objectives: Alzheimer's disease (AD) remains one of the most prevalent neurodegenerative conditions among older adults, underscoring the urgent need for accurate and ethically grounded early detection methods. Artificial intelligence (AI) techniques, particularly machine learning and deep learning models, show promise in leveraging neuroimaging biomarkers to support early diagnosis. However, significant challenges persist regarding model explainability, accountability, and responsible implementation in real-world healthcare settings. This study presents a generalized Responsible AI (RAI) framework composed of four core components-explainability, fairness, predictive performance, and uncertainty quantification-to address these challenges. Method: Using the TADPOLE neuroimaging dataset, we implemented a Feedforward Neural Network (FNN) within a unified Responsible AI (RAI) framework integrating explainability, fairness, predictive performance, and uncertainty quantification. Although Random Forest achieved slightly higher predictive accuracy (95%), the FNN was selected as the primary model because it better supports end-to-end uncertainty estimation through Monte Carlo Dropout, enabling more reliable clinical decision support. Results: The proposed framework demonstrated strong predictive performance (92% accuracy), improved fairness reflected by an equalized odds difference of 0.124, and progressively lower predictive entropy across training iterations, indicating enhanced confidence in predictions. The framework further enabled model transparency through explainability analyses and supported the identification of low-confidence predictions for potential clinical review. Conclusions: Our findings highlight not only the feasibility of integrating RAI principles into AD prediction pipelines but also the persistent challenges of applying such frameworks to real-world clinical data. This work contributes practical insights toward operationalizing Responsible AI in healthcare contexts.
Exact interpretable learning is attractive in regulated decision settings, but solver runtime can vary substantially across datasets and solver families. We introduce structural meta-features derived from Feature Interaction Graphs (FIG s) as interpretable signals for solver selection. We construct FIG s from binarized tabular data using pairwise mutual information and extract topology-aware signatures such as density and estimated treewidth. Using a transparent shallow decision-tree selector, we demonstrate that FIG features establish an interpretable structural view of solver behavior, complementing basic, statistical, and landmarking meta-features. Experiments on OpenML classification tasks show that topology-aware profiling exposes meaningful structural variation across datasets, although benchmark saturation prevents clear end-to-end routing gains over strong simple baselines. Our results validate FIG as a principled, interpretable diagnostic tool for algorithm selection in exact learning; its diagnostic relevance becomes apparent on harder instances where solver runtime separation is substantial.
Genetic code expansion (GCE) enables the site-specific incorporation of noncanonical amino acids (ncAAs) into proteins but is constrained by reliance on exogenously supplied chiral ncAAs. Achieving intracellular ncAA biosynthesis would enable more scalable and cost-effective GCE. Here, we report the continuous hypermutation and evolution of amino acid synthases that produce high levels of ncAAs inside yeast, thus supporting GCE from simple ncAA precursors. We encoded an engineered "tyrosine synthase" (TmTyrS) on an error-prone orthogonal DNA replication system (OrthoRep) and selected variants based on ncAA biosynthesis from readily available phenol analogs and intracellular l-serine. Our selection employed orthogonal ncAA-specific aminoacyl-tRNA synthetases (aaRSs) as biosensors whereby target ncAA production leads to aminoacylation of an amber suppressor tRNA and the translation of a selectable reporter containing an amber stop codon. Our evolution successfully yielded TmTyrS variants that efficiently produced 3-iodo-, 3-bromo-, 3-chloro-, and 3-methyl-l-tyrosine, enabling amber codon-specified ncAA-dependent translation, in some cases at levels comparable to sense codon-specified natural amino acid translation. This work reduces barriers for expressing proteins containing substituted tyrosines. Moreover, because aaRSs can themselves be evolved (including with OrthoRep) for a flexible range of ncAA specificities, these results establish an end-to-end framework for evolving ncAA biosynthetic enzymes in vivo.
Single-lead electrocardiogram (ECG) is widely used in wearable devices for atrial fibrillation (AF) screening. Nevertheless, subtle pathological characteristics like P-waves and f-waves in practical signals are vulnerable to noise contamination. Meanwhile, the scarcity of high-quality annotated abnormal data instances leads to severe class imbalance. To mitigate these issues, we present an end-to-end framework designed for arrhythmia diagnosis using single-lead ECG signals, which integrates quality-aware data augmentation with a Peak-Enhanced attention mechanism. First, to mitigate the problem of data imbalance, a Quality-Aware Generative Adversarial Network (QA-GAN) is designed. This network integrates a signal quality evaluation module based on signal kurtosis, together with a dynamic soft-label training scheme, guiding the generator to prioritize learning high-quality morphological features, thereby synthesizing high-fidelity minority class samples. Second, to accurately capture subtle pathological features in electrocardiograms, a Peak-Enhanced Attention Convolutional Network (PEAC-Net) classification model is proposed. This model incorporates a Peak-Enhanced Attention (PE-Att) module, which employs learnable derivative convolutional kernels to precisely identify the transition points in the ECG signal. Furthermore, by integrating one-dimensional multi-scale dilated convolution (DSGC1D) with bidirectional LSTM, the model achieves effective capturing of both fine-grained local morphological features and long-range global rhythm patterns. Experimental results on the PhysioNet 2017 dataset indicate that the presented model attains an accuracy of 0.902 and a macro-F1 score of 0.880, respectively, outperforming other state-of-the-art models and also exhibiting robust data adaptability on the MIT-BIH dataset.
High-precision indoor trajectory estimation using pure Inertial Measurement Units (IMUs) remains challenging due to severe cumulative drift and the complexity of modeling nonlinear dynamics. This paper proposes LKAN, a novel end-to-end framework that integrates the Kolmogorov-Arnold Network (KAN) with Long-History Statistical Regularization (LHSR). We design the KANmer encoder, which fuses Multi-Head Self-Attention with KAN to explicitly capture long-range temporal dependencies and intricate nonlinear features from IMU data. To enhance model robustness, a training-only Long-History Statistical Regularization mechanism is introduced; it effectively suppresses feature distribution drift by enforcing historical statistical consistency. Extensive evaluations on three public datasets demonstrate that LKAN significantly outperforms state-of-the-art methods in IMU-only pedestrian localization. Specifically, on the iIMU-TD dataset, LKAN achieves an Absolute Trajectory Error (ATE) of 2.04 m and a Relative Trajectory Error (RTE) of 2.72 m, representing a reduction of 33.8% and 31.1%, respectively, compared to the second-best ResT-IMU. Results on the RoNIN dataset further validate the superiority of LKAN. These findings confirm that LKAN effectively mitigates error accumulation, providing a reliable, high-precision solution for real-time IMU-based positioning in complex indoor environments.
Smart buildings require intelligent and scalable solutions to monitor environmental conditions and manage increasingly complex data streams generated by distributed sensing infrastructures. In this context, the paper presents an edge-enabled Digital Twin framework for smart office environments, integrating real-time data acquisition, distributed intelligence, and machine learning-based analytics. The framework adopts a multi-layer architecture composed of a sensor layer, a cloud-edge intelligence layer, and an interaction layer, aligned with Digital Twin reference models. By enabling low-latency processing at the edge and supporting continuous model lifecycle management through Machine Learning Operations (MLOps) practices, the proposed approach overcomes key limitations of traditional cloud-centric solutions. Autoencoder-based models are deployed across the cloud-edge continuum to perform real-time anomaly detection on time-series sensor data. A prototype has been implemented in a real smart office environment, where heterogeneous environmental data are continuously collected and processed. Experimental results demonstrate effective end-to-end data flow, stable long-term operation, and reliable anomaly detection with low-latency response. The system enables real-time monitoring and data-driven analysis of environmental conditions, improving situational awareness and supporting operational decision-making. These findings confirm the effectiveness of integrating Digital Twin technologies with edge AI and MLOps principles for scalable and efficient smart building monitoring systems.
Underground mine emergencies compromise fixed communication infrastructure exactly when situational awareness is most critical for effective rescue operations. Existing LoRa mesh protocols fail in underground mines because they ignore the structured topology of tunnel networks, specifically the waveguide effect along straight galleries, severe signal discontinuity at junctions, and the dead-end geometry of working faces. This paper presents the Topology-Aware Concurrent LoRa (TACL) mesh protocol, in which each node autonomously infers its structural role from local RF observations and packet header information, without GPS, pre-loaded mine maps, or central coordination. Role classification resolves the contender estimation problem (Nh) left open in the prior concurrent transmission literature, enabling provably bounded timing offsets before transmission. TACL assigns a spreading factor (SF)12 to dead-end source nodes for maximum link robustness and SF7-SF10 to relay nodes to create the inter-SF orthogonality margin required for concurrent decoding at junction nodes. Monte Carlo simulation of over 2000 trials yields TACL a PDR of 80.5% versus near-zero for all three baselines, confirming that topology-aware SF diversity is the necessary and sufficient mechanism to prevent junction collision collapse. Hardware deployment at the Missouri S&T Experimental Mine yields a 4.0× PDR improvement over the topology-agnostic concurrent transmission (CT)-fixed baseline, a median end-to-end latency of 1815 ms with 84× tighter latency spread than ALOHA-based protocols and 2.5× lower energy per delivered packet. These results establish that explicit exploitation of underground mine topology is essential for reliable, predictable, and energy-efficient emergency mesh communications in post-disaster underground mine scenarios.
Visual impairment affects approximately 2.2 billion people worldwide, yet existing assistive technologies remain fragmented and prohibitively expensive. This paper presents Munir, an integrated multimodal assistive system designed to enhance human-computer interaction through a combination of a mobile application and Bluetooth-enabled smart glasses. Munir leverages a hybrid machine learning architecture to provide inclusive, real-time support for daily living activities. The system integrates ten core capabilities-including face recognition, optical character recognition, and scene description-all accessible through a unified bilingual (Arabic/English) voice interface. By employing on-device processing for biometric tasks, Munir ensures user privacy and trust while maintaining high responsiveness. End-to-end system evaluation on the SCface dataset achieves a 96.69% recognition rate with 0% False Accept Rate. At an estimated first-year total cost of $806, Munir demonstrates a 4-5× cost advantage over commercial alternatives, providing a scalable and affordable multimodal solution for global digital inclusion.
Multi-object tracking and segmentation (MOTS) aims to jointly perform pixel-level instance segmentation and temporal identity association for multiple objects in video sequences. Existing online decoupled MOTS methods face several challenges in complex scenarios, including limited front-end mask quality, corruption of memory representations under prolonged occlusion, and unstable data association and trajectory recovery. To address these limitations, we propose TrackRefine, a plug-and-play decoupled enhancement framework. TrackRefine enhances overall performance through back-end refinement without modifying the architecture of the front-end instance segmenter or relying on additional end-to-end joint training. Specifically, we introduce a lightweight Fast GrabCut-based mask refinement module to optimize mask boundaries, a multimodal long-short-term memory bank that integrates appearance, semantic, and shape cues for identity modeling, and a progressive three-stage association strategy for stable matching and long-term trajectory recovery. Experimental results on MOTS20 show that TrackRefine achieves 69.4 sMOTSA, 82.7 MOTSA, and 478 Frag. Experimental results on KITTI MOTS show that it achieves 62.4/73.7 sMOTSA and 78.0/85.4 MOTSA for pedestrians and cars, respectively. Extensive experiments with different front-end instance segmenters verify its plug-and-play flexibility and decoupled design, while ablation studies confirm the effectiveness of each core module. These results show that TrackRefine provides an efficient and practical solution for online MOTS in complex scenarios.