The recent Segment Anything Model (SAM) 2 has demonstrated remarkable foundational competence in semantic segmentation, with its memory mechanism and mask decoder further addressing challenges in video tracking and object occlusion, thereby achieving superior results in interactive segmentation for both images and videos. Building upon our previous empirical studies, we further explore the zero-shot segmentation performance of SAM 2 in robot-assisted surgery based on prompts, alongside its robustness against real-world corruption. For static images, we employ two forms of prompts: 1-point and bounding box, while for video sequences, the 1-point prompt is applied to the initial frame. Through extensive experimentation on the MICCAI EndoVis 2017 and EndoVis 2018 benchmarks, SAM 2, when utilizing bounding box prompts, outperforms state-of-the-art (SOTA) methods in comparative evaluations. The results with point prompts also exhibit a substantial enhancement over SAM's capabilities, nearing or even surpassing existing unprompted SOTA methodologies. Besides, SAM 2 demonstrates improved inference speed and less performance degradation against various image corruption. Although slightly u
We present Safe Surgery by Identifying Pushouts (SSIP), an open-source lightweight Python package for automating surgery between qubit CSS codes. SSIP is flexible: it is capable of performing both external surgery, that is surgery between two codeblocks, and internal surgery, that is surgery within the same codeblock. Under the hood, it performs linear algebra over $\mathbb{F}_2$ governed by universal constructions in the category of chain complexes. We demonstrate on quantum Low-Density Parity Check (qLDPC) codes, which are not topological codes in general, and are of interest for near-term fault-tolerant quantum computing. Such qLDPC codes include lift-connected surface codes, generalised bicycle codes and bivariate bicycle codes. We show that various logical measurements can be performed cheaply by surgery without sacrificing the high code distance. For example, half of the single-qubit logical measurements in the $Z$ or $X$ basis on the $[[ 144 ,12, 12 ]]$ gross code require only 30 total additional qubits each, assuming the upper bound on distance given by QDistRnd is tight. This is two orders of magnitude lower than the additional qubit count of 1380 initially predicted by Br
Artificial intelligence is poised to augment dermatological care by enabling scalable image-based diagnostics. Yet, the development of robust and equitable models remains hindered by datasets that fail to capture the clinical and demographic complexity of real-world practice. This complexity stems from region-specific disease distributions, wide variation in skin tones, and the underrepresentation of outpatient scenarios from non-Western populations. We introduce DermaCon-IN, a prospectively curated dermatology dataset comprising 5,450 clinical images from 3,002 patients across outpatient clinics in South India. Each image is annotated by board-certified dermatologists with 245 distinct diagnoses, structured under a hierarchical, aetiology-based taxonomy adapted from Rook's classification. The dataset captures a wide spectrum of dermatologic conditions and tonal variation commonly seen in Indian outpatient care. We benchmark a range of architectures, including convolutional models (ResNet, DenseNet, EfficientNet), transformer-based models (ViT, MaxViT, Swin), and Concept Bottleneck Models to establish baseline performance and explore how anatomical and concept-level cues may be int
Dermatologic diseases impose a large and growing global burden, affecting billions and substantially reducing quality of life. While modern therapies can rapidly control acute symptoms, long-term outcomes are often limited by single-target paradigms, recurrent courses, and insufficient attention to systemic comorbidities. Traditional Chinese medicine (TCM) provides a complementary holistic approach via syndrome differentiation and individualized treatment, but practice is hindered by non-standardized knowledge, incomplete multimodal records, and poor scalability of expert reasoning. We propose DERM-3R, a resource-efficient multimodal agent framework to model TCM dermatologic diagnosis and treatment under limited data and compute. Based on real-world workflows, we reformulate decision-making into three core issues: fine-grained lesion recognition, multi-view lesion representation with specialist-level pathogenesis modeling, and holistic reasoning for syndrome differentiation and treatment planning. DERM-3R comprises three collaborative agents: DERM-Rec, DERM-Rep, and DERM-Reason, each targeting one component of this pipeline. Built on a lightweight multimodal LLM and partially fine-
Purpose: The facial recess is a delicate structure that must be protected in minimally invasive cochlear implant surgery. Current research estimates the drill trajectory by using endoscopy of the unique mastoid patterns. However, missing depth information limits available features for a registration to preoperative CT data. Therefore, this paper evaluates OCT for enhanced imaging of drill holes in mastoid bone and compares OCT data to original endoscopic images. Methods: A catheter-based OCT probe is inserted into a drill trajectory of a mastoid phantom in a translation-rotation manner to acquire the inner surface state. The images are undistorted and stitched to create volumentric data of the drill hole. The mastoid cell pattern is segmented automatically and compared to ground truth. Results: The mastoid pattern segmented on images acquired with OCT show a similarity of J = 73.6 % to ground truth based on endoscopic images and measured with the Jaccard metric. Leveraged by additional depth information, automated segmentation tends to be more robust and fail-safe compared to endoscopic images. Conclusion: The feasibility of using a clinically approved OCT probe for imaging the dri
Purpose: Interventions at the otobasis operate in the narrow region of the temporal bone where several highly sensitive organs define obstacles with minimal clearance for surgical instruments. Nonlinear trajectories for potential minimally-invasive interventions can provide larger distances to risk structures and optimized orientations of surgical instruments, thus improving clinical outcomes when compared to existing linear approaches. In this paper, we present fast and accurate planning methods for such nonlinear access paths. Methods: We define a specific motion planning problem in SE(3) = R3 x SO(3) with notable constraints in computation time and goal pose that reflect the requirements of temporal bone surgery.We then present k-RRT-Connect: two suitable motion planners based on bidirectional Rapidly-exploring Random Trees (RRT) to solve this problem efficiently. Results: The benefits of k-RRT-Connect are demonstrated on real CT data of patients. Their general performance is shown on a large set of realistic synthetic anatomies. We also show that these new algorithms outperform state of the art methods based on circular arcs or Bezier-Splines when applied to this specific probl
Purpose: Common dense stereo Simultaneous Localization and Mapping (SLAM) approaches in Minimally Invasive Surgery (MIS) require high-end parallel computational resources for real-time implementation. Yet, it is not always feasible since the computational resources should be allocated to other tasks like segmentation, detection, and tracking. To solve the problem of limited parallel computational power, this research aims at a lightweight dense stereo SLAM system that works on a single-core CPU and achieves real-time performance (more than 30 Hz in typical scenarios). Methods: A new dense stereo mapping module is integrated with the ORB-SLAM2 system and named BDIS-SLAM. Our new dense stereo mapping module includes stereo matching and 3D dense depth mosaic methods. Stereo matching is achieved with the recently proposed CPU-level real-time matching algorithm Bayesian Dense Inverse Searching (BDIS). A BDIS-based shape recovery and a depth mosaic strategy are integrated as a new thread and coupled with the backbone ORB-SLAM2 system for real-time stereo shape recovery. Results: Experiments on in-vivo data sets show that BDIS-SLAM runs at over 30 Hz speed on modern single-core CPU in typ
Concept erasure in text-to-image diffusion models is crucial for mitigating harmful content, yet existing methods often compromise generative quality. We introduce Semantic Surgery, a novel training-free, zero-shot framework for concept erasure that operates directly on text embeddings before the diffusion process. It dynamically estimates the presence of target concepts in a prompt and performs a calibrated vector subtraction to neutralize their influence at the source, enhancing both erasure completeness and locality. The framework includes a Co-Occurrence Encoding module for robust multi-concept erasure and a visual feedback loop to address latent concept persistence. As a training-free method, Semantic Surgery adapts dynamically to each prompt, ensuring precise interventions. Extensive experiments on object, explicit content, artistic style, and multi-celebrity erasure tasks show our method significantly outperforms state-of-the-art approaches. We achieve superior completeness and robustness while preserving locality and image quality (e.g., 93.58 H-score in object erasure, reducing explicit content to just 1 instance, and 8.09 H_a in style erasure with no quality degradation).
Multimodal large language models (MLLMs) have demonstrated promise on publicly available dermatology benchmarks. However, benchmark performance may not generalize to real-world dermatologic decision-making. To quantify this benchmark-to-bedside gap, we evaluated four open-weight MLLMs (InternVL-Chat v1.5, LLaVA-Med v1.5, SkinGPT4 and MedGemma-4B-Instruct) and one commercial MLLM (GPT-4.1) across three publicly available dermatology datasets and a retrospective multi-site hospital-based dermatology consultation cohort comprising 5,811 cases and 46,405 clinical images. Models were evaluated on two clinically relevant tasks: differential diagnosis generation and severity-based triage. Diagnostic performance was modest on public datasets and declined substantially in the real-world cohort. On public benchmarks, top-3 diagnostic accuracy reached 26.55% for the best open-weight model and 42.25% for GPT-4.1. On real-world consultation cases using images alone, top-3 diagnostic accuracy fell to 1.50%-13.35% among open-weight models and 24.65% for GPT-4.1. Incorporating clinical context improved performance across all models, increasing top-3 diagnostic accuracy up to 28.75% among open-weig
We study the effect of surgery on transverse knots in contact 3-manifolds. In particular, we investigate the effect of such surgery on open books, the Heegaard Floer contact invariant, and tightness. The overarching theme of this paper is to show that in many contexts, surgery on transverse knots is more natural than surgery on Legendrian knots. Besides reinterpreting surgery on Legendrian knots in terms of transverse knots, our main results on are in two complementary directions: conditions under which inadmissible transverse surgery (\textit{cf.\@} positive contact surgery on Legendrian knots) preserves tightness, and conditions under which it creates overtwistedness. In the first direction, we give the first result on the tightness of inadmissible transverse surgery for contact manifolds with vanishing Heegaard Floer contact invariant. In particular, inadmissible transverse surgery on the connected binding of a genus $g$ open book that supports a tight contact structure preserves tightness if the surgery coefficient is greater than $2g-1$. In the second direction, along with more general statements, we deduce a partial generalisation to a result of Lisca and Stipsicz: when $L$ i
It is known that any contact 3-manifold can be obtained by rational contact Dehn surgery along a Legendrian link L in the standard tight contact 3-sphere. We define and study various versions of contact surgery numbers, the minimal number of components of a surgery link L describing a given contact 3-manifold under consideration. In the first part of the paper, we relate contact surgery numbers to other invariants in terms of various inequalities. In particular, we show that the contact surgery number of a contact manifold is bounded from above by the topological surgery number of the underlying topological manifold plus three. In the second part, we compute contact surgery numbers of all contact structures on the 3-sphere. Moreover, we completely classify the contact structures with contact surgery number one on $S^1\times S^2$, the Poincaré homology sphere, and the Brieskorn sphere $Σ(2,3,7)$. We conclude that there exist infinitely many non-isotopic contact structures on each of the above manifolds which cannot be obtained by a single rational contact surgery from the standard tight contact $3$-sphere. We further obtain results for the 3-torus and lens spaces. As one ingredient
With the advent of robot-assisted surgery, the role of data-driven approaches to integrate statistics and machine learning is growing rapidly with prominent interests in objective surgical skill assessment. However, most existing work requires translating robot motion kinematics into intermediate features or gesture segments that are expensive to extract, lack efficiency, and require significant domain-specific knowledge. We propose an analytical deep learning framework for skill assessment in surgical training. A deep convolutional neural network is implemented to map multivariate time series data of the motion kinematics to individual skill levels. We perform experiments on the public minimally invasive surgical robotic dataset, JHU-ISI Gesture and Skill Assessment Working Set (JIGSAWS). Our proposed learning model achieved a competitive accuracy of 92.5%, 95.4%, and 91.3%, in the standard training tasks: Suturing, Needle-passing, and Knot-tying, respectively. Without the need of engineered features or carefully-tuned gesture segmentation, our model can successfully decode skill information from raw motion profiles via end-to-end learning. Meanwhile, the proposed model is able to
Objectives : This paper presents a new simulator for maxillo-facial surgery, that gathers the dental and the maxillo-facial analyses together into a single computer-assisted procedure. The idea is first to propose a repositioning of the maxilla, via the introduction of a 3D cephalometry, applied to a 3D virtual model of the patient's skull. Then, orthodontic data are integrated into this model, thanks to optical measurements of teeth plaster casts. Materials and Methods : The feasibility of the maxillo-facial demonstrator was first evaluated on a dry skull. To simulate malformations (and thus to simulate a "real" patient), the skull was modified and manually cut by the surgeon, in order to generate a given maxillo-facial malformation (with asymmetries in the sagittal, frontal and axial planes). Results : The validation of our simulator consisted in evaluating its ability to propose a bone repositioning diagnosis that will put the skull as it was in its original configuration. A first qualitative validation is provided in this paper, with a 1.5-mm error in the repositioning diagnosis. Conclusions : These results mainly validate the concept of a maxillo-facial numerical simulator tha
BACKGROUND: Clinical factors influence surgery duration. This study also investigated non-clinical effects. METHODS: 22 months of data about thoracic operations in a large hospital in China were reviewed. Linear and nonlinear regression models were used to predict the duration of the operations. Interactions among predictors were also considered. RESULTS: Surgery duration decreased with the number of operations a surgeon performed in a day (P<0.001). Also, it was found that surgery duration decreased with the number of operations allocated to an OR as long as there were no more than four surgeries per day in the OR (P<0.001), but increased with the number of operations if it was more than four (P<0.01). The duration of surgery was affected by its position in a sequence of surgeries performed by a surgeon. In addition, surgeons exhibited different patterns of the effects of surgery type for surgeries in different positions in the day. CONCLUSIONS: Surgery duration was affected not only by clinical effects but also some non-clinical effects. Scheduling and allocation decisions significantly influenced surgery duration.
This paper presents simulations of the impact of tongue surgery on tongue movements and on speech articulation. For this, a 3D biomechanical Finite Element (FE) model of the tongue is used. Muscles are represented within the FE structure by specific subsets of elements. The tongue model is inserted in the upper airways including jaw, palate and pharyngeal walls. Two examples of tongue surgery, which are quite common in the treatment of cancers of the oral cavity are modelled: hemiglossectomy and large resection of the mouth floor. Three kinds of reconstruction are also modelled, assuming flaps with a low, medium or high stiffnesses. The impact of the surgery without any reconstruction and with the three different reconstructions is quantitatively measured and compared during simulated speech production sequences. More precisely, differences in global 3D tongue shape and in velocity patterns during tongue displacements are evaluated.