The objective of this study was to develop a nomogram and assess the predictive value of imaging features-derived from ultrasound, mammography, and contrast-enhanced magnetic resonance imaging (MRI) of primary breast lesions-in combination with clinicopathological factors and serological tumor markers, for predicting pathological complete response (pCR) after neoadjuvant chemotherapy (NAC) in patients with breast cancer (BC). Retrospective analysis was used in 294 breast cancer patients who received NAC and subsequently underwent surgery at the Harbin Medical University Cancer Hospital from 2017 to 2023. Patients were randomly assigned to a training cohort (n = 206) or a validation cohort (n = 88) in a 7:3 ratio. Data collected included preoperative imaging features of the primary breast lesion from conventional ultrasound, mammography, and contrast-enhanced MRI, as well as clinicopathological factors and serological tumor markers. After comparing the baseline characteristics between the two cohorts, univariate analysis was performed on the training cohort. Variables with significant results in the univariate analysis were incorporated into a multivariate logistic regression model. Backward stepwise selection was employed to identify independent risk factors of nonpathological complete response (non-pCR). A nomogram was constructed based on the final multivariate model. The model's discriminatory power was evaluated using the receiver operating characteristic (ROC) curve, and its calibration was assessed with a calibration plot and the Hosmer Lemeshow goodness-of-fit test. Of 294 enrolled patients, 87 (29.6%) achieved pCR. Univariate analysis in the training cohort identified multiple factors potentially associated with non-pCR. These factors included clinical pathological markers such as ER, PR, HER2, and Ki-67 status; ultrasound features including tumor location, distance from the nipple, hyperechoic halo, posterior echo, and calcification; mammographic characteristics encompassing mass margin, microcalcification, distribution and morphology of microcalcification, asymmetry, density of asymmetry, and other signs; and contrast-enhanced MRI parameters like background parenchymal enhancement (BPE) and mass margin. Multivariate logistic regression analysis subsequently demonstrated that ER, HER2, Ki-67, tumor location, distance from the nipple, morphology of microcalcification, and mass margin on contrast-enhanced MRI independently predicted non-pCR (p < 0.05). A nomogram incorporating these independent predictors showed excellent discrimination; the training cohort's AUC was 0.833 (95% CI 0.772-0.893), the validation cohort's AUC was 0.749 (95% CI 0.640-0.857). This robust predictive model represents a significant step toward individualized treatment strategies by accurately forecasting the likelihood of pCR following neoadjuvant chemotherapy in breast cancer patients.
This scoping review aimed to answer the question: to what extent do artificial intelligence applications in dental and orthopedic skeletal imaging demonstrate true cross-disciplinary methodological convergence versus parallel development with shared translational barriers? This scoping review synthesizes AI applications across both fields to characterize methodological overlap, developmental asymmetries, and translational gaps, rather than assuming convergence. Following the PRISMA-ScR reporting standards, we searched PubMed, Scopus, Web of Science, IEEE Xplore, and EMBASE for peer-reviewed, English-language human studies published between January 2015 and May 2025. Eligible studies applied AI, machine learning, or deep learning to diagnostic, segmentation, or preoperative planning tasks in dental or orthopedic imaging. Three reviewers independently extracted data on imaging modality, task, model architecture, dataset characteristics, validation strategy, performance metrics, and translational considerations, with random auditing for consistency. Fifty-nine studies met inclusion criteria, comprising 48 dental (81.36%) and 11 orthopedic (18.64%) investigations, with no study spanning both domains. Most applications focused on foundational tasks such as segmentation and detection/classification using two-dimensional radiographs and cone-beam computed tomography. Computed tomography primarily supported bony anatomy and preoperative planning, while magnetic resonance imaging, the EOS system, and intraoral scanners were used in specialized workflows. Convolutional neural networks, particularly U-Net/nnU-Net variants and EfficientNet/ResNet backbones with YOLO-based detectors, dominated, alongside emerging transformer-based and hybrid physics-informed approaches. Internal validation performance was frequently high for segmentation (typical Dice 0.90-0.99), while more complex or anatomically challenging targets showed lower and more variable performance. External validation, prospective evaluation, and standardized reporting of calibration, expert comparators, and demographic performance were uncommon. The current AI skeletal imaging literature demonstrates strong technical feasibility but uneven clinical maturity, with dental imaging dominating in volume and automation of foundational tasks and orthopedic applications remaining fewer, more heterogeneous, and less mature. Rather than evidencing true cross-disciplinary convergence, the findings highlight asymmetrical development and shared translational barriers, particularly in validation rigor and real-world integration. By explicitly identifying these asymmetries, this review provides a realistic foundation for future cross-disciplinary collaboration focused on harmonized validation standards, clinically meaningful benchmarks, and equitable, workflow-native deployment.
Radiology reports for pancreatic cystic lesions frequently contain uncertainty expressions (rule-out (R/O), differential diagnosis (DDx)) alongside structured diagnostic codes. While cross-sectional studies have documented the prevalence of such hedging language, the temporal relationship between structured codes and narrative uncertainty during extended surveillance has not been systematically examined. The purpose is to characterize the temporal evolution of diagnostic language in pancreatic imaging reports by integrating structured health screening codes with natural language processing (NLP) and to quantify code-narrative discrepancy patterns over long-term follow-up. This retrospective study analyzed 1791 pancreatic imaging reports from 399 patients (mean 4.49 examinations per patient) obtained between March 2020 and March 2024 at a Korean health examination center. Mean follow-up was 2.72 years (median 1103 days, IQR 700-1446). We developed a regular expression-based NLP algorithm to extract uncertainty expressions from narrative reports. Structured diagnostic codes were standardized into five categories: Cyst, IPMN, Other, Tumor, and R/O Malignancy. Patient-level trajectories were analyzed for code transitions and narrative uncertainty patterns across temporal phases. While 59.4% (237/399) of patients maintained stable diagnostic codes throughout follow-up, 32.7% of patients with uncertainty expressions exhibited persistent narrative uncertainty despite extended surveillance. Uncertainty expression rates showed no significant temporal decline from early (52.1%) to late (62.1%) phases (p = 0.425). Among patients with stable Cyst codes, 33.9% (95/280) continued to receive reports with IPMN-related hedging language. Diagnostic codes changed in 40.6% of patients, with bidirectional transitions observed between consecutive examinations (130 Cyst → IPMN, 121 IPMN → Cyst). First-to-last analysis revealed that 65% (26/40) of patients initially categorized as IPMN were reclassified as Cyst by their final examination. Structured diagnostic codes and narrative uncertainty expressions follow divergent trajectories in pancreatic imaging surveillance. The persistence of hedging language despite code stability suggests that uncertainty reflects reporting practices and risk communication strategies rather than evolving diagnostic confidence, highlighting the need for improved alignment between structured and narrative diagnostic communication.
Lumbar intervertebral disc degeneration (LIDD) is a leading cause of low back pain, with subtle and variable imaging features that challenge early diagnosis. This study aimed to develop and validate a rigorous MRI-based radiomics ensemble model for disc-level LIDD discrimination, using the patient as the primary sampling unit, with explicit statistical correction for the non-independence of multiple lumbar discs from the same patient. This retrospective single-center study enrolled 122 subjects (102 LIDD patients and 20 healthy controls), contributing a total of 610 lumbar discs. Regions of interest (ROIs) of intervertebral discs were manually segmented on fat-suppressed T2-weighted imaging (FS-T2WI) sequences, and 1409 Image Biomarker Standardization Initiative (IBSI)-compliant radiomic features were extracted. To account for within-patient clustering of discs, multi-step feature selection was performed in the patient-level split training set, including Generalized Estimating Equations (GEE), Benjamini-Hochberg FDR correction, Spearman correlation-based redundancy removal, and L1-regularized logistic regression. Three base classifiers (logistic regression (LR), random forest (RF), radial basis function SVM) and a soft-voting ensemble model were trained with patient-level fivefold group cross-validation to avoid data leakage. Model performance for disc-level LIDD diagnosis was evaluated via AUC, accuracy, sensitivity, and specificity in an independent patient-level test set, with SHapley Additive exPlanations (SHAP) for model interpretability. A compact, reproducible radiomic signature was derived from the final selected features. All models achieved excellent diagnostic performance in the independent test set: RF (AUC = 0.966, 95% CI: 0.937-0.988), SVM (AUC = 0.974, 95% CI: 0.949-0.992), and LR (AUC = 0.974, 95% CI: 0.949-0.992). The soft-voting ensemble model achieved the best discrimination with an AUC of 0.976 (95% CI: 0.954-0.992), along with balanced sensitivity (88%) and specificity (96%). SHAP analysis identified key intensity- and texture-based radiomic features driving model predictions. The MRI-based radiomics ensemble model, built with rigorous statistical correction for within-patient clustering of discs and patient-level validation, enables accurate and interpretable disc-level LIDD discrimination. This model shows strong promise for assisting the early detection and objective diagnosis of LIDD in clinical practice.
Accurate differentiation between benign and malignant thyroid nodules remains challenging in clinical practice. Current deep learning approaches predominantly rely on single-modality analysis, failing to leverage complementary information from multiple clinical data sources. This study aims to develop and validate ThyroFusion, a multi-modal deep learning framework integrating ultrasound images, segmentation masks, and clinical text reports for improved thyroid nodule malignancy risk assessment. In this retrospective multi-center study, we developed ThyroFusion, a multi-modal fusion framework comprising: (1) a dual-stream ResNet-50 encoder with partially shared parameters for extracting features from ultrasound images and segmentation masks; (2) a Set Transformer module for aggregating variable numbers of image features; and (3) a bidirectional cross-modal attention mechanism for fusing visual and textual features extracted by frozen BioBERT. The framework was trained on 1472 cases from Xi'an International Medical Center Hospital and validated on four independent external test sets totaling 4530 cases from two clinical centers and two public datasets (DDTI and TN3K). Performance was compared against state-of-the-art deep learning models and radiologists with varying experience levels. ThyroFusion achieved an AUC of 0.937 (95% CI 0.914-0.960) on internal validation and 0.896 (95% CI 0.887-0.905) on combined external validation. Compared to single-modal approaches, ThyroFusion significantly outperformed ResNet-50 (AUC: 0.841), DenseNet-121 (AUC 0.848), EfficientNet-B4 (AUC 0.859), and Vision Transformer (AUC 0.835) on external validation (all p < 0.001). The model also outperformed senior radiologists (AUC 0.809) and demonstrated substantial improvement in junior radiologists' performance when used as an assistive tool (ΔAUC = 0.126). On public datasets, ThyroFusion achieved AUCs of 0.893 on DDTI and 0.881 on TN3K, demonstrating robust cross-domain generalization. ThyroFusion demonstrates robust performance in thyroid nodule malignancy risk assessment across multiple centers and public benchmarks, significantly outperforming state-of-the-art single-modal methods and experienced radiologists. The integration of visual and textual information through bidirectional cross-modal attention offers a promising tool for clinical decision support.
Traditional radiology education is constrained by a restricted apprenticeship model and a scarcity of datasets structured for building artificial intelligence (AI)-based radiology education systems. To address this problem, we developed a novel end-to-end framework for transforming vast clinical archives into scalable radiology education resources. The proposed framework converts static radiographic data into an interactive learning system through three integrated components. First, a multi-stage curation pipeline establishes a foundation of trustworthy cases suitable for radiology education from noisy public archives. Second, a large language model pipeline automatically generates a rich library of questions engineered to build core radiology reasoning skills. Finally, this content is deployed on an interactive, gamified platform that uses an adaptive algorithm to deliver a personalized and engaging learning experience. The curation pipeline distilled an initial pool of 493,785 images into a final dataset of 881 high-fidelity chest radiographs, from which the automated content generation pipeline produced 2305 multiple-choice questions. The system was implemented as the League of Radiologists, a publicly accessible platform ( https://radontology.org ), demonstrating the feasibility of the proposed end-to-end architecture. A field demonstration resulted in 40 registered users and 68 unique examination sessions without technical failure, with 37.5% of active participants returning for multiple sessions. While currently focused on single finding chest radiographs, this study provides a practical and reproducible blueprint for implementing an AI-enabled adaptive radiology education platform using heterogeneous clinical imaging data. The described framework offers an extensible foundation for future development and evaluation of AI-driven educational systems in medical imaging.
Generative adversarial networks (GANs) are increasingly used to generate synthetic medical images, addressing the critical shortage of annotated data for training artificial intelligence (AI) systems. This study introduces conditional random field (CRF)-GAN, a novel memory-efficient GAN architecture that enhances structural consistency in 3D medical image synthesis. Integrating conditional random fields (CRFs) within a two-step generation process, allows CRF-GAN improving spatial coherence while maintaining high-resolution image quality. The model is designed to be computationally efficient, avoiding the need for additional GANs or post-processing. Its performance is evaluated against the state-of-the-art hierarchical (HA)-GAN model. We evaluate the performance of CRF-GAN against the state-of-the-art hierarchical (HA)-GAN model. The comparison between the two models was made through a quantitative evaluation, using Fréchet Inception distance (FID) and maximum mean discrepancy (MMD) metrics, and a qualitative evaluation, through a two-alternative forced choice (2AFC) test completed by a pool of 12 resident radiologists, in order to assess the realism of the generated images. CRF-GAN outperformed HA-GAN with lower FID (0.047 vs. 0.061) and MMD (0.084 vs. 0.086) scores, indicating better image fidelity. The 2AFC test showed a significant preference for images generated by CRF-Gan over those generated by HA-GAN with a p-value of 1.93e - 05. Additionally, CRF-GAN demonstrated 9.34% lower memory usage at 2563 resolution and achieved up to 14.6% faster training speeds, offering substantial computational savings. CRF-GAN model successfully generates high-resolution 3D medical images with non-inferior quality to conventional models, while being more memory-efficient and faster. The key objective was not only to lower the computational cost but also to reallocate the freed-up resources towards the creation of higher-resolution 3D imaging, which is still a critical factor limiting their direct clinical applicability. Moreover, unlike many previous studies, we combined qualitative and quantitative assessments to obtain a more holistic feedback of model's performance.
Deep learning has emerged as a promising approach for skin lesion analysis. However, existing methods mostly rely on fully supervised learning, requiring extensive labeled data, which is challenging and costly to obtain. To alleviate this annotation burden, this study introduces a novel semi-supervised deep learning approach that integrates ensemble learning with online knowledge distillation for enhanced skin lesion classification. Our methodology involves training an ensemble of convolutional neural network models, using online knowledge distillation to transfer insights from the ensemble to its members. This process aims to enhance the performance of each model within the ensemble, thereby elevating the overall performance of the ensemble itself. Post-training, any individual model within the ensemble can be deployed at test time, as each member is trained to deliver comparable performance to the ensemble. This is particularly beneficial in resource-constrained environments. Experimental results demonstrate that the knowledge-distilled individual model performs better than independently trained models. Our approach outperforms current state-of-the-art on ISIC (International Skin Imaging Collaboration) 2018, ISIC 2019, and ISIC 2020 benchmark datasets. For example, with only 10% labeled data on ISIC 2018, we observe around 3 % gain in macro F1 versus ReFixMatch-LS, a recently proposed state-of-the-art for skin lesion classification framework. Furthermore, the proposed method significantly reduces label requirements: while a fully supervised baseline reaches an F1 of 62.40 ± 0.71 with 20% labeled data, our approach attains comparable performance using just 10%. These results highlight its superior label efficiency and practical relevance for real-world skin lesion classification. The code is available online.
Accurate histopathological classification of renal cell carcinoma (RCC), along with its distinction from benign mimickers, is essential for precision oncology and optimal patient management. The morphological complexity and subtle differences among RCC tumors pose significant diagnostic challenges, often resulting in notable inter-observer variability. This paper presents a novel texture-informed hybrid deep learning framework that addresses these challenges by integrating a Rotation-Invariant Multi-Threshold Local Binary Pattern (RIMT-LBP) descriptor with a cascaded CNN-Transformer architecture for robust multiclass classification of renal cell neoplasms. The proposed RIMT-LBP descriptor is designed to capture multiscale tissue heterogeneity while maintaining robustness to orientation variability inherent in histopathological slides. Integrating this descriptor with the original images enriches the representation of RCC tumor features. Classification is performed using a proposed hybrid model that combines MobileNetV3Large for efficient local feature extraction and Transformer encoders for global contextual modeling. This hybrid architecture enables complementary analysis of both fine-grained cellular morphology and broader tissue architecture. The proposed model demonstrated strong whole slide image-level performance, achieving average weighted precision of 95.84%, recall of 95.36%, F1-score of 95.60%, and accuracy of 98.10%. Further analysis showed that the combined RGB+LBP approach (F1-score: 89.18%) outperformed both the RGB-only (F1-score: 84.44%) and LBP-only (F1-score: 80.31%) configurations at the patch level, confirming the complementary value of texture-informed features. Evaluation on two independent public datasets (TCGA-RCC and DHMC) confirmed the framework's consistent performance, with accuracies of 93.13% and 96.12%, respectively. These results highlight the clinical potential of integrating AI models for complementary analysis to improve RCC diagnostic accuracy.
The aim of this study is to develop and evaluate the performance of a two-stage deep learning-based artificial intelligence framework for the automatic segmentation and etiological classification of pleural effusion from noncontrast thoracic computed tomography (CT) images. In this retrospective study, patients with pleural effusion detected on noncontrast thorax CT and with available pathogenic and/or cytological examinations after diagnostic thoracentesis were included. In the first stage, pleural effusion regions were automatically segmented using a U-Net-based deep learning model. In the second stage, pleural effusions were classified into three groups-empyema, malignant, and transudative-using quantitative imaging features derived from the segmentation masks, including area-, density-, and texture-based features. Logistic regression, support vector machines, random forest, and gradient boosting algorithms were used for classification. The U-Net-based segmentation model demonstrated high agreement in delineating pleural effusion regions and achieved successful segmentation performance on the validation dataset. In the etiological classification performed using quantitative features extracted after segmentation, the highest performance was obtained with tree-based models. The gradient boosting and random forest algorithms achieved 96% accuracy and a macro F1-score of 0.95 in three-class etiological discrimination. Feature importance analysis showed that pleural effusion area, the standard deviation of intensity within the mask, and GLCM-based parameters reflecting texture heterogeneity were the most discriminative features for classification. The two-stage artificial intelligence approach developed in this study achieved high accuracy in the automatic segmentation and etiological classification of pleural effusion on noncontrast thorax CT images. The proposed system has the potential to serve as a strong decision support tool in clinical practice by enabling rapid, objective, and reproducible evaluation of pleural effusions.
Cesarean scar pregnancy (CSP) is a severe form of ectopic pregnancy, where early screening and monitoring are critical to reducing the risk of prolonged uterine bleeding and other serious complications. Accurate segmentation of pregnancy tissue plays a vital role in clinical assessment and treatment planning. However, the segmentation of pregnancy tissue is particularly challenging due to the diverse morphology and small size of target regions, which causes limited accuracy in existing studies. In addition, large-scale annotated datasets are lacking, and manual annotation is costly and time-consuming. To address these issues, we propose a prototype-oriented local contrastive learning framework for semi-supervised pregnancy tissue segmentation, which addresses the informatics challenges of limited labeled data and fine-grained feature extraction in medical image segmentation. Specifically, representative prototypes are first extracted to characterize the distribution of features in different images. Then, a prototype-guided local contrastive strategy is introduced to incorporate supervised signals into the contrastive learning process. This guides unlabeled data to align with supervised prototype centers, thereby improving segmentation accuracy. Experiments conducted on self-constructed pregnancy tissue dataset demonstrated that the proposed method achieved Dice coefficients of 86.91% at a 50% labeling rate. To further evaluate the generalizability of the method, we also validated it on the public cardiac dataset, achieving a Dice coefficient of 87.34%. These results not only advance semi-supervised learning in medical imaging informatics but also provide a reliable tool for accurate CSP tissue segmentation, supporting clinical decision-making in early ectopic pregnancy management.
Diabetic foot ulcers, resulting from neuropathic and/or vascular complications in patients with diabetes mellitus, pose a major global health challenge. Early detection and consistent monitoring of wound progression are essential for timely intervention, effective treatment, and the prevention of severe complications such as amputation. In modern diabetic foot care, images captured using digital cameras and mobile phones are increasingly employed for remote wound assessment. In this context, automated segmentation of these wounds from such images plays a vital role by enabling objective and quantitative evaluation of wound areas-crucial for tracking the progression of healing over time. Recent years have witnessed growing interest in deep learning-based wound segmentation techniques, with a particular focus on models that are both computationally efficient and suitable for deployment on resource-constrained devices, including smartphones and point-of-care platforms. In this study, we propose a lightweight convolutional neural network (CNN) for diabetic foot wound segmentation that augments the U-Net architecture with ghost feature generation and Convolutional Block Attention Modules (CBAM) to improve computational efficiency and feature representation. The model was evaluated on a privately annotated dataset of 3450 diabetic foot wound images and compared against state-of-the-art architectures, including SegNet, U-Net, MobileNetV2, Mask R-CNN, and the domain-specific approach of Wang et al. We further investigated a fully automated two-step pipeline for wound segmentation incorporating a prior foot segmentation-based ROI detection. Using ROI detection, the proposed CNN achieved a precision of 85.13%, recall of 91.84%, Dice coefficient of 86.95%, and IoU of 77.23%. These results demonstrate competitive performance relative to high-capacity models while maintaining substantially reduced computational complexity, highlighting its suitability for real-time clinical deployment in low-resource environments.
Non-cancer mortality (NCM) accounts for a substantial proportion of deaths in cancer survivors, with cardiovascular disease (CVD) being the leading cause. Abdominal aortic calcification (AAC) is a strong predictor of cardiovascular and all-cause mortality but remains underused due to the burden of manual scoring. Automated pipelines could enable opportunistic CVD risk screening from routine oncology CTs. We benchmarked three widely available AAC tools against manual reference standards in a multi-institutional prostate cancer cohort. We retrospectively analysed staging CTs from 99 men in the control arm of the STAMPEDE trial. Manual AAC was quantified using adaptive thresholding and Agatston scoring. Automated AAC scoring was performed using OSCAR (institutional agreement), Comp2Comp (open-source), and DAFS (commercial licence). Agreement with manual reference was assessed using Pearson correlation (r), intraclass correlation coefficients (ICC), Bland-Altman analyses, and categorical risk concordance (Cohen's κ). Manual scoring was highly reproducible (ICC = 0.99) but required > 12 min per scan. Automated pipelines reduced processing to < 5 min. OSCAR achieved the strongest agreement with manual AAC (r = 0.92, κ = 0.93), followed by DAFS (r = 0.88, κ = 0.91) and Comp2Comp (r = 0.75, κ = 0.74). Volumetric measures were reproducible across all tools (r ≥ 0.89). Failure occurred in < 10% of scans, mainly at slice thickness < 1 mm. OSCAR and DAFS were stable across patient and scan factors, whereas Comp2Comp was more sensitive to acquisition parameters. Automated AAC quantification is accurate, reproducible, and significantly faster than manual scoring. These findings support its role in cardiovascular screening for cancer and cancer treatment-related risk in oncology using routine CT scans.
The adoption of artificial intelligence in clinical practice is often limited by the technical complexity of model development, particularly for medical professionals without programming expertise. This study aims to evaluate and compare existing no-code and low-code AI platforms for medical image classification, while also demonstrating to clinicians how AI tools can be practically implemented without having technical expertise. In this work, a systematic evaluation of such platforms applied to the classification of skin diseases based on dermoscopic images. A repository of 34 no-code and low-code platforms available on the internet was gathered. By setting specific criteria for inclusion, supporting image classification, being user-friendly, being useful in healthcare, and serving as a deployment solution, the AI research team narrowed down the list from 22 to 17 then just five platforms for further exploration. In this paper, a standardized dataset that includes around 8000 labeled thermoscopic images across eight disease categories has been used to compare the selected platforms in the literature. Teachable Machine had the lowest accuracy (85.2%) and the shortest training time, whereas Edge Impulse had the highest accuracy (89.9%), with the shortest training time, and Roboflow had the lowest accuracy (86.8%), with the longest training time. The key contributions of the study are a systematic survey of available no-code and low-code AI services to classify skin disease images, a performance analysis, training efficiency, and usability trade-offs, and practical recommendations on how clinicians and healthcare professionals can use AI tools in clinical and healthcare environments without having to possess advanced technical skills.
Vision-language models can connect the text description of an object to its specific location in an image through visual grounding. This has potential applications in enhanced radiology reporting. However, these models require large annotated image-text datasets, which are lacking for PET/CT. We developed an automated pipeline to generate weak image-text labels and used it to train a 3D visual grounding model. Our weak-labeling pipeline identified sentences describing positive findings in PET/CT reports by searching for mentions of standardized uptake values (SUVmax) and axial slice numbers. These were used to automatically generate lesion masks, which were paired with the corresponding text descriptions. From 25,578 PET/CT exams, we extracted 11,356 sentence-label pairs. Using this data, we trained ConTEXTual Net 3D, which takes as input a description of a lesion and generates a corresponding segmentation mask. The model's performance was evaluated on 251 radiologist-reviewed cases and compared against LLMSeg, a 2.5D version of ConTEXTual Net, and two radiologists. We evaluated detection performance using F1 score. The weak-labeling pipeline accurately identified lesion locations in 98% of cases (246/251). ConTEXTual Net 3D achieved an F1 score of 0.80, outperforming LLMSeg (F1 = 0.22) and the 2.5D model (F1 = 0.53), though it underperformed both radiologists (F1 = 0.94 and 0.91). The model achieved better performance on 18F-fluorodeoxyglucose (F1 = 0.78) and DCFPyL (F1 = 0.75) exams than on DOTATATE (F1 = 0.58) and 18F-fluciclovine (F1 = 0.66) exams. In conclusion, our novel weak labeling pipeline accurately produced an annotated dataset of PET/CT image-text pairs. ConTEXTual Net 3D significantly outperformed other models but fell short of the performance of nuclear medicine physicians. Our study suggests that even larger datasets may be needed to close this performance gap.
This study aimed to conduct a multidimensional evaluation of artificial intelligence (AI) chatbot-generated patient information regarding cone beam-computed tomography (CBCT) in dentistry, with specific focus on readability, informational quality, reliability, and patient-centered suitability. Twenty frequently asked, patient-oriented questions related to CBCT were systematically identified from a public online forum. Each question was submitted to four large language model-based chatbots (ChatGPT-4o, Gemini Advanced, Claude Sonnet 4, and Microsoft Copilot) under standardized conditions. Generated responses were evaluated using validated instruments, including the DISCERN tool and Global Quality Scale (GQS) for information quality and reliability, as well as Flesch Reading Ease, Flesch-Kincaid Grade Level, and Gunning Fog Index for readability. Patient-centeredness was further assessed using PEMAT-Understandability and PEMAT-Actionability scores. Comparative analyses were performed using linear mixed-effects models. Significant differences were observed among chatbots across all evaluated domains (p < 0.05). While advanced models demonstrated higher informational quality and reliability, their responses frequently exceeded recommended health literacy thresholds. Readability, transparency, and actionability varied substantially between platforms. No chatbot consistently met all criteria for optimal patient-directed communication. AI chatbots can provide generally accurate information on CBCT; however, variability in readability, reliability, and educational suitability limits their standalone use for patient education. Careful integration with professional oversight is essential to ensure safe and accessible AI-supported communication in dentomaxillofacial radiology. This study provides the first multidimensional, comparative evaluation of leading AI chatbots in delivering patient-oriented information about cone beam-computed tomography. It shows critical gaps between informational accuracy and health literacy suitability. This study reveals the need for professional oversight when using AI for patient education in dentomaxillofacial radiology.
Existing segmentation models trained on a single medical imaging dataset often lack robustness when encountering unseen organs or tumors. Developing a robust model capable of identifying rare or novel tumor categories not present during training is crucial for advancing medical imaging applications. We propose DSM, a novel framework that leverages diffusion and state space models to segment unseen tumor categories beyond the training data. DSM utilizes two sets of object queries trained within modified attention decoders to enhance classification accuracy. Initially, the model learns organ queries using an object-aware feature grouping strategy to capture organ-level visual features. It then refines tumor queries by focusing on diffusion-based visual prompts, enabling precise segmentation of previously unseen tumors. Furthermore, we incorporate diffusion-guided feature fusion to improve semantic segmentation performance. By integrating CLIP text embeddings, DSM captures category-sensitive classes to improve linguistic transfer knowledge, thereby enhancing the model's robustness across diverse scenarios and multi-label tasks. DSM consistently outperforms state-of-the-art out-of-distribution detection methods, achieving improvements of 0.1962 in mean AUROC, 0.2675 in mean FPR 95 , and 0.1736 in mean DSC. Extensive experiments demonstrate the superior performance of DSM in various tumor segmentation tasks.
The purpose of this study was to investigate the efficacy of a three-dimensional (3D) deep learning (DL) model in predicting recurrence risk of stage IA invasive lung adenocarcinoma (ILADC) after sub-lobar resection (SLR). A total of 287 stage IA ILADC patients were assigned to training and internal validation sets (4:1), with an external test cohort of 112 patients from two institutions. Three clinical models, five 3D DL models and a combined clinic-radiological-DL model were developed. Model performance was compared to identify the best-performing one. Patients were stratified into high/low-risk groups using the optimal predictive probability threshold from the best model. Survival analysis was performed to compare prognosis between groups. Furthermore, the pathological-molecular characteristics of tumors were compared between high/low-risk groups. Among clinical models, SVM achieved the highest AUCs (training: 0.819, internal validation: 0.785, and external testing: 0.758). The 3D VGG-16 DL model outperformed others with AUCs of 0.921, 0.856, and 0.830, respectively. The combined model yielded AUCs of 0.932, 0.882, and 0.854, respectively. Both 3D VGG-16 and the combined model showed significantly higher sensitivity than the clinical model (all p < 0.05). High-risk patients classified by 3D VGG-16 model had shorter recurrence-free survival/overall survival (all p < 0.05) and higher prevalence of micropapillary/solid-predominant growth pattern, STAS, and mutations or fusions in KRAS and ALK (all p < 0.05). 3D VGG-16 effectively predicts post-SLR recurrence risk for stage IA ILADC, serving as a potential tool to guide surgical treatment decisions.
Artificial intelligence-based computer-aided diagnosis (CADx) systems have seen growing adoption in mammography, yet the limited interpretability of their decision-making processes remains a barrier to clinical trust. The present study aimed to investigate whether deep learning classifiers primarily rely on the characteristics of lesions or the surrounding breast tissue through a counterfactual reasoning, specifically using semantic masking in mammogram texture. We modified a part of mammograms by selectively removing texture information from lesion (foreground, FG) or non-lesion (background, BG) regions, replacing it with the mean image intensity, resulting in four scenarios involving benign and malignant foreground or background alterations. MobileNet, ResNet50, and ResNet50v2 were trained and evaluated on the CBIS-DDSM dataset; the area under the ROC curve (AUC) was used for assessing classification performance. All models had similar performance (AUCs = 0.74, 0.72, and 0.78, pairwise p-value > 0.05) on the original unaltered test set. Performance results differed dramatically under the above four masking scenarios: ResNet50 went completely wrong (AUC = 0.20, p-value < 0.0001) when malignant background information was removed, proving strong dependence on background context and difficulty focusing on subtle lesion features, while ResNet50v2 showed improved robustness (albeit its performance was severely impacted) for the same changes (AUC = 0.53, p-value < 0.0001), suggesting better preservation of lesion-level information. MobileNet was relatively stable across all masking scenarios, indicating robustness to region-specific changes. Understanding such region-specific dependencies can enhance model interpretability and support the development of more robust and reliable CADx systems for clinical use.
Three-dimensional (3D) histopathology is an important expansion to histopathology, as a complete understanding of the 3D structure of tissues can lead to better diagnoses and treatments. For certain highly deformable tissues like carotid plaques, 3D histology reconstructions are a challenging endeavor, requiring context-dependent corrections of artifacts that undergo more significant deformations during histological processing. Currently, there is no method of 3D reconstruction specifically designed for highly deformed histology that contains multiple spatially disconnected tissue components. To address this, we present ARONG, an Artifact-correcting Reconstruction Of Nonrigidly-deformed Geometries. ARONG is a pipeline that allows the user to reconstruct highly deformed histology in 3D. ARONG provides a methodology for iteratively aligning 2D histology slides through a set of affine transformations, while providing guidance on the mechanism of correction and order of priority for fixing common artifacts that appear in highly deformable tissue histology. Since highly deformable tissue histology often contains distorted local features, shapes, and edges, we also outline our matching criteria for aligning regions within neighboring slides. Using ARONG, we reconstructed slides from twenty human atheromatous carotid plaques, which are often highly deformable, and computed intersection over union with ex vivo ultrasound for four of the specimens ( 0.64 ± 0.19 ). ARONG outperformed the next best transformation method (CODA, a state-of-the-art 3D reconstruction program) with a ≈ 14 % higher Jaccard index. We also validated this pipeline with two human FaDu xenograft tumors, three murine hearts, and two murine carotid arteries sectioned at different intervals, with comparable or improved metrics compared to CODA and other relevant 3D reconstruction methods.