A reliable evaluation of surgical difficulty can improve the success of the treatment for rectal cancer and the current evaluation method is based on clinical data. However, more data about rectal cancer can be collected with the development of technology. Meanwhile, with the development of artificial intelligence, its application in rectal cancer treatment is becoming possible. In this paper, a multi-view rectal cancer dataset is first constructed to give a more comprehensive view of patients, including the high-resolution MRI image view, pressed-fat MRI image view, and clinical data view. Then, an interpretable incomplete multi-view surgical evaluation model is proposed, considering that it is hard to obtain extensive and complete patient data in real application scenarios. Specifically, a dual representation incomplete multi-view learning model is first proposed to extract the common information between views and specific information in each view. In this model, the missing view imputation is integrated into representation learning, and second-order similarity constraint is also introduced to improve the cooperative learning between these two parts. Then, based on the imputed mu
Clinical trial studies indicate benefit of watch-and-wait (WW) surveillance for patients with rectal cancer showing a complete or near clinical response (CR) directly after treatment (restaging). However, there are no objectively accurate methods to early detect local tumor regrowth (LR) in patients undergoing WW from follow-up exams. Hence, we developed Temporal Rectal Endoscopy Cross-attention (TREX), a longitudinal deep learning approach that combines pairs of images acquired at restaging and follow-up to distinguish CR from LR. TREX uses pretrained Swin Transformers in a siamese setting to extract features from longitudinal images and dual cross-attention to combine the features without spatial co-registration between image pairs. TREX and Swin-based baselines were trained under two settings: (a) detecting LR or CR at the last available follow-up and (b) early detection of LR at 3--6, 6--12, and 12--24 months before clinical confirmation. TREX achieved the highest accuracy in detecting LR with a high sensitivity of 97% $\pm$ 6% and a balanced accuracy of 90% $\pm$ 3%, and outperformed all baselines in early detection at both 3--6 (74% $\pm$ 1%) and 6--12 months (62% $\pm$ 4%) p
Accurate segmentation of metastatic lymph nodes in rectal cancer is crucial for the staging and treatment of rectal cancer. However, existing segmentation approaches face challenges due to the absence of pixel-level annotated datasets tailored for lymph nodes around the rectum. Additionally, metastatic lymph nodes are characterized by their relatively small size, irregular shapes, and lower contrast compared to the background, further complicating the segmentation task. To address these challenges, we present the first large-scale perirectal metastatic lymph node CT image dataset called Meply, which encompasses pixel-level annotations of 269 patients diagnosed with rectal cancer. Furthermore, we introduce a novel lymph-node segmentation model named CoSAM. The CoSAM utilizes sequence-based detection to guide the segmentation of metastatic lymph nodes in rectal cancer, contributing to improved localization performance for the segmentation model. It comprises three key components: sequence-based detection module, segmentation module, and collaborative convergence unit. To evaluate the effectiveness of CoSAM, we systematically compare its performance with several popular segmentation m
Accurate segmentation of rectal lymph nodes is crucial for the staging and treatment planning of rectal cancer. However, the complexity of the surrounding anatomical structures and the scarcity of annotated data pose significant challenges. This study introduces a novel lymph node synthesis technique aimed at generating diverse and realistic synthetic rectal lymph node samples to mitigate the reliance on manual annotation. Unlike direct diffusion methods, which often produce masks that are discontinuous and of suboptimal quality, our approach leverages an implicit SDF-based method for mask generation, ensuring the production of continuous, stable, and morphologically diverse masks. Experimental results demonstrate that our synthetic data significantly improves segmentation performance. Our work highlights the potential of diffusion model for accurately synthesizing structurally complex lesions, such as lymph nodes in rectal cancer, alleviating the challenge of limited annotated data in this field and aiding in advancements in rectal cancer diagnosis and treatment.
Endorectal ultrasound (ERUS) is an important imaging modality that provides high reliability for diagnosing the depth and boundary of invasion in colorectal cancer. However, the lack of a large-scale ERUS dataset with high-quality annotations hinders the development of automatic ultrasound diagnostics. In this paper, we collected and annotated the first benchmark dataset that covers diverse ERUS scenarios, i.e. colorectal cancer segmentation, detection, and infiltration depth staging. Our ERUS-10K dataset comprises 77 videos and 10,000 high-resolution annotated frames. Based on this dataset, we further introduce a benchmark model for colorectal cancer segmentation, named the Adaptive Sparse-context TRansformer (ASTR). ASTR is designed based on three considerations: scanning mode discrepancy, temporal information, and low computational complexity. For generalizing to different scanning modes, the adaptive scanning-mode augmentation is proposed to convert between raw sector images and linear scan ones. For mining temporal information, the sparse-context transformer is incorporated to integrate inter-frame local and global features. For reducing computational complexity, the sparse-co
Fecal incontinence, arising from a myriad of pathogenic mechanisms, has attracted considerable global attention. Despite its significance, the replication of the defecatory system for studying fecal incontinence mechanisms remains limited largely due to social stigma and taboos. Inspired by the rectum's functionalities, we have developed a soft robotic system, encompassing a power supply, pressure sensing, data acquisition systems, a flushing mechanism, a stage, and a rectal module. The innovative soft rectal module includes actuators inspired by sphincter muscles, both soft and rigid covers, and soft rectum mold. The rectal mold, fabricated from materials that closely mimic human rectal tissue, is produced using the mold replication fabrication method. Both the soft and rigid components of the mold are realized through the application of 3D-printing technology. The sphincter muscles-inspired actuators featuring double-layer pouch structures are modeled and optimized based on multilayer perceptron methods aiming to obtain high contractions ratios (100%), high generated pressure (9.8 kPa), and small recovery time (3 s). Upon assembly, this defecation robot is capable of smoothly exp
Pretraining on large-scale datasets has been shown to improve transformer generalizability, even for out-of-domain (OOD) modalities and tasks. However, two common assumptions often fail under OOD transfer: that downstream datasets can be adapted to the fixed input geometry of pretrained models and that pretrained representations transfer effectively across imaging modalities. We show that these assumptions break down through two interacting failure modes in CT-to-MRI transfer: inefficient token usage caused by zero-padding to match pretrained input dimensions and ineffective feature adaptation. These failures led to accuracy degradation despite extensive fine-tuning. We investigated these failure modes using two CT-pretrained hierarchical shifted-window transformer backbones, SMIT and Swin UNETR, pretrained with different objectives and datasets. Mechanistic analysis introduced an attention dilution index (ADI), an entropy-based metric quantifying attention diverted toward uninformative padding tokens, and centered kernel alignment (CKA) to measure feature reuse in MRI tasks. ADI increased with zero-padding, while high feature reuse did not necessarily correspond to improved accura
Accurate preoperative assessment of lymph node (LN) metastasis in rectal cancer guides treatment decisions, yet conventional MRI evaluation based on morphological criteria shows limited diagnostic performance. While some artificial intelligence models have been developed, they often operate as black boxes, lacking the interpretability needed for clinical trust. Moreover, these models typically evaluate nodes in isolation, overlooking the patient-level context. To address these limitations, we introduce LRMR, an LLM-Driven Relational Multi-node Ranking framework. This approach reframes the diagnostic task from a direct classification problem into a structured reasoning and ranking process. The LRMR framework operates in two stages. First, a multimodal large language model (LLM) analyzes a composite montage image of all LNs from a patient, generating a structured report that details ten distinct radiological features. Second, a text-based LLM performs pairwise comparisons of these reports between different patients, establishing a relative risk ranking based on the severity and number of adverse features. We evaluated our method on a retrospective cohort of 117 rectal cancer patients
Accurate lymph node metastasis (LNM) assessment in rectal cancer is essential for treatment planning, yet current MRI-based evaluation shows unsatisfactory accuracy, leading to suboptimal clinical decisions. Developing automated systems also faces significant obstacles, primarily the lack of node-level annotations. Previous methods treat lymph nodes as isolated entities rather than as an interconnected system, overlooking valuable spatial and contextual information. To solve this problem, we present WeGA, a novel weakly-supervised global-local affinity learning framework that addresses these challenges through three key innovations: 1) a dual-branch architecture with DINOv2 backbone for global context and residual encoder for local node details; 2) a global-local affinity extractor that aligns features across scales through cross-attention fusion; and 3) a regional affinity loss that enforces structural coherence between classification maps and anatomical regions. Experiments across one internal and two external test centers demonstrate that WeGA outperforms existing methods, achieving AUCs of 0.750, 0.822, and 0.802 respectively. By effectively modeling the relationships between i
Increasing evidence supports watch-and-wait (WW) surveillance for patients with rectal cancer who show clinical complete response (cCR) at restaging following total neoadjuvant treatment (TNT). However, objectively accurate methods to early detect local regrowth (LR) from follow-up endoscopy images during WW are essential to manage care and prevent distant metastases. Hence, we developed a Siamese Swin Transformer with Dual Cross-Attention (SSDCA) to combine longitudinal endoscopic images at restaging and follow-up and distinguish cCR from LR. SSDCA leverages pretrained Swin transformers to extract domain agnostic features and enhance robustness to imaging variations. Dual cross attention is implemented to emphasize features from the two scans without requiring any spatial alignment of images to predict response. SSDCA as well as Swin-based baselines were trained using image pairs from 135 patients and evaluated on a held-out set of image pairs from 62 patients. SSDCA produced the best balanced accuracy (81.76\% $\pm$ 0.04), sensitivity (90.07\% $\pm$ 0.08), and specificity (72.86\% $\pm$ 0.05). Robustness analysis showed stable performance irrespective of artifacts including blood
Objectives Accurate MRI-based identification of extramural vascular invasion (EVI) and mesorectal fascia invasion (MFI) is crucial for risk-stratified rectal cancer treatment. However, subjective visual assessment and inter-institutional variability limit diagnostic consistency. This study developed and externally evaluated a multi-centre, foundation model-driven framework that automatically classifies EVI and MFI on axial and sagittal MRI. Methods A total of 331 pre-treatment rectal cancer T2-weighted MRI scans from three European hospitals were retrospectively recruited. A self-supervised frequency domain harmonization strategy was applied to reduce scanner variability. Three classifiers, SeResNet, the universal biomedical pretrained model (UMedPT) with a multilayer perceptron head, and a logistic-regression variant using frozen UMedPT features (UMedPT_LR), were trained (n=265) and tested (n=66). Gradient-weighted class activation mapping (Grad-CAM) visualized model predictions. Results UMedPT_LR achieved the best EVI performance with multiplanar fusion (AUC=0.82, test set). For MFI, UMedPT trained on axial harmonized images yielded the highest performance (AUC = 0.77). Both task
Effective treatment for rectal cancer relies on accurate lymph node metastasis (LNM) staging. However, radiological criteria based on lymph node (LN) size, shape and texture morphology have limited diagnostic accuracy. In this work, we investigate applying a Variational Autoencoder (VAE) as a feature encoder model to replace the large pre-trained Convolutional Neural Network (CNN) used in existing approaches. The motivation for using a VAE is that the generative model aims to reconstruct the images, so it directly encodes visual features and meaningful patterns across the data. This leads to a disentangled and structured latent space which can be more interpretable than a CNN. Models are deployed on an in-house MRI dataset with 168 patients who did not undergo neo-adjuvant treatment. The post-operative pathological N stage was used as the ground truth to evaluate model predictions. Our proposed model 'VAE-MLP' achieved state-of-the-art performance on the MRI dataset, with cross-validated metrics of AUC 0.86 +/- 0.05, Sensitivity 0.79 +/- 0.06, and Specificity 0.85 +/- 0.05. Code is available at: https://github.com/benkeel/Lymph_Node_Classification_MIUA.
Rectal cancer is one of the most common diseases and a major cause of mortality. For deciding rectal cancer treatment plans, T-staging is important. However, evaluating the index from preoperative MRI images requires high radiologists' skill and experience. Therefore, the aim of this study is to segment the mesorectum, rectum, and rectal cancer region so that the system can predict T-stage from segmentation results. Generally, shortage of large and diverse dataset and high quality annotation are known to be the bottlenecks in computer aided diagnostics development. Regarding rectal cancer, advanced cancer images are very rare, and per-pixel annotation requires high radiologists' skill and time. Therefore, it is not feasible to collect comprehensive disease patterns in a training dataset. To tackle this, we propose two kinds of approaches of image synthesis-based late stage cancer augmentation and semi-supervised learning which is designed for T-stage prediction. In the image synthesis data augmentation approach, we generated advanced cancer images from labels. The real cancer labels were deformed to resemble advanced cancer labels by artificial cancer progress simulation. Next, we
Endoscopic images are used at various stages of rectal cancer treatment starting from cancer screening, diagnosis, during treatment to assess response and toxicity from treatments such as colitis, and at follow up to detect new tumor or local regrowth (LR). However, subjective assessment is highly variable and can underestimate the degree of response in some patients, subjecting them to unnecessary surgery, or overestimate response that places patients at risk of disease spread. Advances in deep learning has shown the ability to produce consistent and objective response assessment for endoscopic images. However, methods for detecting cancers, regrowth, and monitoring response during the entire course of patient treatment and follow-up are lacking. This is because, automated diagnosis and rectal cancer response assessment requires methods that are robust to inherent imaging illumination variations and confounding conditions (blood, scope, blurring) present in endoscopy images as well as changes to the normal lumen and tumor during treatment. Hence, a hierarchical shifted window (Swin) transformer was trained to distinguish rectal cancer from normal lumen using endoscopy images. Swin
Volumetric images from Magnetic Resonance Imaging (MRI) provide invaluable information in preoperative staging of rectal cancer. Above all, accurate preoperative discrimination between T2 and T3 stages is arguably both the most challenging and clinically significant task for rectal cancer treatment, as chemo-radiotherapy is usually recommended to patients with T3 (or greater) stage cancer. In this study, we present a volumetric convolutional neural network to accurately discriminate T2 from T3 stage rectal cancer with rectal MR volumes. Specifically, we propose 1) a custom ResNet-based volume encoder that models the inter-slice relationship with late fusion (i.e., 3D convolution at the last layer), 2) a bilinear computation that aggregates the resulting features from the encoder to create a volume-wise feature, and 3) a joint minimization of triplet loss and focal loss. With MR volumes of pathologically confirmed T2/T3 rectal cancer, we perform extensive experiments to compare various designs within the framework of residual learning. As a result, our network achieves an AUC of 0.831, which is higher than the reported accuracy of the professional radiologist groups. We believe this
Rectal cancer segmentation of CT image plays a crucial role in timely clinical diagnosis, radiotherapy treatment, and follow-up. Although current segmentation methods have shown promise in delineating cancerous tissues, they still encounter challenges in achieving high segmentation precision. These obstacles arise from the intricate anatomical structures of the rectum and the difficulties in performing differential diagnosis of rectal cancer. Additionally, a major obstacle is the lack of a large-scale, finely annotated CT image dataset for rectal cancer segmentation. To address these issues, this work introduces a novel large scale rectal cancer CT image dataset CARE with pixel-level annotations for both normal and cancerous rectum, which serves as a valuable resource for algorithm research and clinical application development. Moreover, we propose a novel medical cancer lesion segmentation benchmark model named U-SAM. The model is specifically designed to tackle the challenges posed by the intricate anatomical structures of abdominal organs by incorporating prompt information. U-SAM contains three key components: promptable information (e.g., points) to aid in target area localiza
The integration of medical imaging, computational analysis, and robotic technology has brought about a significant transformation in minimally invasive surgical procedures, particularly in the realm of laparoscopic rectal surgery (LRS). This specialized surgical technique, aimed at addressing rectal cancer, requires an in-depth comprehension of the spatial dynamics within the narrow space of the pelvis. Leveraging Magnetic Resonance Imaging (MRI) scans as a foundational dataset, this study incorporates them into Computer-Aided Design (CAD) software to generate precise three-dimensional (3D) reconstructions of the patient's anatomy. At the core of this research is the analysis of the surgical workspace, a critical aspect in the optimization of robotic interventions. Sophisticated computational algorithms process MRI data within the CAD environment, meticulously calculating the dimensions and contours of the pelvic internal regions. The outcome is a nuanced understanding of both viable and restricted zones during LRS, taking into account factors such as curvature, diameter variations, and potential obstacles. This paper delves deeply into the complexities of workspace analysis for ro
Fecal incontinence (FI) is a significant health issue with various underlying causes. Research in this field is limited by social stigma and the lack of effective replication models. To address these challenges, we developed a sophisticated rectal simulator that integrates power, control, and data acquisition systems with soft pouch actuators. The system comprises four key subsystems: mechanical, electrical, pneumatic, and control and data acquisition. The mechanical subsystem utilizes common materials such as aluminum frames, wooden boards, and compact structural components to facilitate the installation and adjustment of electrical and control components. The electrical subsystem supplies power to regulators and sensors. The pneumatic system provides compressed air to actuators, enabling the simulation of FI. The control and data acquisition subsystem collects pressure data and regulates actuator movement. This comprehensive approach allows the robot to accurately replicate human defecation, managing various feces types including liquid, solid, and extremely solid. This innovation enhances our understanding of defecation and holds potential for advancing quality-of-life devices r
It is a challenge to segment the location and size of rectal cancer tumours through deep learning. In this paper, in order to improve the ability of extracting suffi-cient feature information in rectal tumour segmentation, attention enlarged ConvNeXt UNet (AACN-UNet), is proposed. The network mainly includes two improvements: 1) the encoder stage of UNet is changed to ConvNeXt structure for encoding operation, which can not only integrate multi-scale semantic information on a large scale, but al-so reduce information loss and extract more feature information from CT images; 2) CBAM attention mechanism is added to improve the connection of each feature in channel and space, which is conducive to extracting the effective feature of the target and improving the segmentation accuracy.The experiment with UNet and its variant network shows that AACN-UNet is 0.9% ,1.1% and 1.4% higher than the current best results in P, F1 and Miou.Compared with the training time, the number of parameters in UNet network is less. This shows that our proposed AACN-UNet has achieved ex-cellent results in CT image segmentation of rectal cancer.
Over 150,000 new people in the United States are diagnosed with colorectal cancer each year. Nearly a third die from it (American Cancer Society). The only approved noninvasive diagnosis tools currently involve fecal blood count tests (FOBTs) or stool DNA tests. Fecal blood count tests take only five minutes and are available over the counter for as low as \$15. They are highly specific, yet not nearly as sensitive, yielding a high percentage (25%) of false negatives (Colon Cancer Alliance). Moreover, FOBT results are far too generalized, meaning that a positive result could mean much more than just colorectal cancer, and could just as easily mean hemorrhoids, anal fissure, proctitis, Crohn's disease, diverticulosis, ulcerative colitis, rectal ulcer, rectal prolapse, ischemic colitis, angiodysplasia, rectal trauma, proctitis from radiation therapy, and others. Stool DNA tests, the modern benchmark for CRC screening, have a much higher sensitivity and specificity, but also cost \$600, take two weeks to process, and are not for high-risk individuals or people with a history of polyps. To yield a cheap and effective CRC screening alternative, a unique ensemble-based classification alg