搜索 — ResearchTracker

In this work, we focus on boosting the feature extraction to improve the performance of Structure-from-Motion (SfM) in endoscopy videos. We present SuperPoint-E, a new local feature extraction method that, using our proposed Tracking Adaptation supervision strategy, significantly improves the quality of feature detection and description in endoscopy. Extensive experimentation on real endoscopy recordings studies our approach's most suitable configuration and evaluates SuperPoint-E feature quality. The comparison with other baselines also shows that our 3D reconstructions are denser and cover more and longer video segments because our detector fires more densely and our features are more likely to survive (i.e. higher detection precision). In addition, our descriptor is more discriminative, making the guided matching step almost redundant. The presented approach brings significant improvements in the 3D reconstructions obtained, via SfM on endoscopy videos, compared to the original SuperPoint and the gold standard SfM COLMAP pipeline.

Development and multi-center evaluation of domain-adapted speech recognition for human-AI teaming in real-world gastrointestinal endoscopy

arXiv2026-04-02作者：Ruijie Yang, Yan Zhu, Peiyao Fu

Automatic speech recognition (ASR) is a critical interface for human-AI interaction in gastrointestinal endoscopy, yet its reliability in real-world clinical settings is limited by domain-specific terminology and complex acoustic conditions. Here, we present EndoASR, a domain-adapted ASR system designed for real-time deployment in endoscopic workflows. We develop a two-stage adaptation strategy based on synthetic endoscopy reports, targeting domain-specific language modeling and noise robustness. In retrospective evaluation across six endoscopists, EndoASR substantially improves both transcription accuracy and clinical usability, reducing character error rate (CER) from 20.52% to 14.14% and increasing medical term accuracy (Med ACC) from 54.30% to 87.59%. In a prospective multi-center study spanning five independent endoscopy centers, EndoASR demonstrates consistent generalization under heterogeneous real-world conditions. Compared with the baseline Paraformer model, CER is reduced from 16.20% to 14.97%, while Med ACC is improved from 61.63% to 84.16%, confirming its robustness in practical deployment scenarios. Notably, EndoASR achieves a real-time factor (RTF) of 0.005, significa

搜索结果：endoscopy

SuperPoint-E: local features for 3D reconstruction via tracking adaptation in endoscopy

Development and multi-center evaluation of domain-adapted speech recognition for human-AI teaming in real-world gastrointestinal endoscopy

NeRFscopy: Neural Radiance Fields for in-vivo Time-Varying Tissues from Endoscopy

CapCLIP: A Vision-Language Representation Alignment Approach for Wireless Capsule Endoscopy Analysis

MetaScope: Optics-Driven Neural Network for Ultra-Micro Metalens Endoscopy

Endora: Video Generation Models as Endoscopy Simulators

3D Densification for Multi-Map Monocular VSLAM in Endoscopy

EndoDINO: A Foundation Model for GI Endoscopy

Prediction of Rectal Cancer Regrowth from Longitudinal Endoscopy

Enhanced Anomaly Detection for Capsule Endoscopy Using Ensemble Learning Strategies

Capsule Endoscopy Multi-classification via Gated Attention and Wavelet Transformations

EndoOOD: Uncertainty-aware Out-of-distribution Detection in Capsule Endoscopy Diagnosis

V$^2$-SfMLearner: Learning Monocular Depth and Ego-motion for Multimodal Wireless Capsule Endoscopy

Learning to Adapt Foundation Model DINOv2 for Capsule Endoscopy Diagnosis

Influence of color correction on pathology detection in Capsule Endoscopy

A Highlight Removal Method for Capsule Endoscopy Images

Whether and When does Endoscopy Domain Pretraining Make Sense?

Self-supervised Learning for Gastrointestinal Pathologies Endoscopy Image Classification with Triplet Loss

Beyond Endoscopy via Poisson Summation for GL(2,K)

Multi-Class Abnormality Classification Task in Video Capsule Endoscopy