搜索 — ResearchTracker

Data scarcity remains a fundamental barrier to achieving fully autonomous surgical robots. While large scale vision language action (VLA) models have shown impressive generalization in household and industrial manipulation by leveraging paired video action data from diverse domains, surgical robotics suffers from the paucity of datasets that include both visual observations and accurate robot kinematics. In contrast, vast corpora of surgical videos exist, but they lack corresponding action labels, preventing direct application of imitation learning or VLA training. In this work, we aim to alleviate this problem by learning policy models from Cosmos-H-Surgical, a world model designed for surgical physical AI. We curated the Surgical Action Text Alignment (SATA) dataset with detailed action description specifically for surgical robots. Then we built Cosmos-H-Surgical based on the most advanced physical AI world model and SATA. It's able to generate diverse, generalizable and realistic surgery videos. We are also the first to use an inverse dynamics model to infer pseudokinematics from synthetic surgical videos, producing synthetic paired video action data. We demonstrate that a surgi

SAMSNeRF: Segment Anything Model (SAM) Guides Dynamic Surgical Scene Reconstruction by Neural Radiance Field (NeRF)

arXiv2023-08-22作者：Ange Lou, Yamin Li, Xing Yao

The accurate reconstruction of surgical scenes from surgical videos is critical for various applications, including intraoperative navigation and image-guided robotic surgery automation. However, previous approaches, mainly relying on depth estimation, have limited effectiveness in reconstructing surgical scenes with moving surgical tools. To address this limitation and provide accurate 3D position prediction for surgical tools in all frames, we propose a novel approach called SAMSNeRF that combines Segment Anything Model (SAM) and Neural Radiance Field (NeRF) techniques. Our approach generates accurate segmentation masks of surgical tools using SAM, which guides the refinement of the dynamic surgical scene reconstruction by NeRF. Our experimental results on public endoscopy surgical videos demonstrate that our approach successfully reconstructs high-fidelity dynamic surgical scenes and accurately reflects the spatial information of surgical tools. Our proposed approach can significantly enhance surgical navigation and automation by providing surgeons with accurate 3D position information of surgical tools during surgery.The source code will be released soon.

搜索结果：Surgical endoscopy

Cosmos-H-Surgical: Learning Surgical Robot Policies from Videos via World Modeling

SAMSNeRF: Segment Anything Model (SAM) Guides Dynamic Surgical Scene Reconstruction by Neural Radiance Field (NeRF)

ORBIT-Surgical: An Open-Simulation Framework for Learning Surgical Augmented Dexterity

Towards Holistic Surgical Scene Graph

SurgPub-Video: A Comprehensive Surgical Video Dataset for Enhanced Surgical Intelligence in Vision-Language Model

Advancing Surgical VQA with Scene Graph Knowledge

SuperPoint-E: local features for 3D reconstruction via tracking adaptation in endoscopy

Conformal forecasting for surgical instrument trajectory

Scaling Video Pretraining for Surgical Foundation Models

SURGIVID: Annotation-Efficient Surgical Video Object Discovery

Surgical Vision World Model

SurgiPose: Estimating Surgical Tool Kinematics from Monocular Video for Surgical Robot Learning

SurgicalMamba: Dual-Path SSD with State Regramming for Online Surgical Phase Recognition

Surgical-LLaVA: Toward Surgical Scenario Understanding via Large Language and Vision Models

MetaScope: Optics-Driven Neural Network for Ultra-Micro Metalens Endoscopy

Zero-Shot Surgical Tool Segmentation in Monocular Video Using Segment Anything Model 2

Endora: Video Generation Models as Endoscopy Simulators

Surgical-VQLA: Transformer with Gated Vision-Language Embedding for Visual Question Localized-Answering in Robotic Surgery

SANGRIA: Surgical Video Scene Graph Optimization for Surgical Workflow Prediction

Identifying Surgical Instruments in Laparoscopy Using Deep Learning Instance Segmentation