Large language models (LLMs) and vision-augmented LLMs (VLMs) have significantly advanced medical informatics, diagnostics, and decision support. However, these models exhibit systematic biases, particularly age bias, compromising their reliability and equity. This is evident in their poorer performance on pediatric-focused text and visual question-answering tasks. This bias reflects a broader imbalance in medical research, where pediatric studies receive less funding and representation despite the significant disease burden in children. To address these issues, a new comprehensive multi-modal pediatric question-answering benchmark, PediatricsMQA, has been introduced. It consists of 3,417 text-based multiple-choice questions (MCQs) covering 131 pediatric topics across seven developmental stages (prenatal to adolescent) and 2,067 vision-based MCQs using 634 pediatric images from 67 imaging modalities and 256 anatomical regions. The dataset was developed using a hybrid manual-automatic pipeline, incorporating peer-reviewed pediatric literature, validated question banks, existing benchmarks, and existing QA resources. Evaluating state-of-the-art open models, we find dramatic performan
Background: Pediatric dental disease remains one of the most prevalent and inequitable chronic health conditions worldwide. Although strong epidemiological evidence links oral health outcomes to socio-economic and demographic determinants, most artificial intelligence (AI) applications in dentistry rely on image-based diagnosis and black-box prediction models, limiting transparency and ethical applicability in pediatric populations. Objective: This study aimed to develop and evaluate an explainable machine learning framework for pediatric dental risk stratification that prioritizes interpretability, calibration, and ethical deployment over maximal predictive accuracy. Methods: A supervised machine learning model was trained using population-level pediatric data including age, income-to-poverty ratio, race/ethnicity, gender, and medical history. Model performance was assessed using receiver operating characteristic (ROC) analysis and calibration curves. Explainability was achieved using SHapley Additive exPlanations (SHAP) to provide global and individual-level interpretation of predictions. Results: The model achieved modest discrimination (AUC = 0.61) with conservative calibration
Despite the advancement of deep learning-based computer-aided diagnosis (CAD) methods for pneumonia from adult chest x-ray (CXR) images, the performance of CAD methods applied to pediatric images remains suboptimal, mainly due to the lack of large-scale annotated pediatric imaging datasets. Establishing a proper framework to leverage existing adult large-scale CXR datasets can thus enhance pediatric pneumonia detection performance. In this paper, we propose a three-branch parallel path learning-based framework that utilizes both adult and pediatric datasets to improve the performance of deep learning models on pediatric test datasets. The paths are trained with pediatric only, adult only, and both types of CXRs, respectively. Our proposed framework utilizes the multi-positive contrastive loss to cluster the classwise embeddings and the embedding similarity loss among these three parallel paths to make the classwise embeddings as close as possible to reduce the effect of domain shift. Experimental evaluations on open-access adult and pediatric CXR datasets show that the proposed method achieves a superior AUROC score of 0.8464 compared to 0.8348 obtained using the conventional appro
Pediatric medical imaging presents unique challenges due to significant anatomical and developmental differences compared to adults. Direct application of segmentation models trained on adult data often yields suboptimal performance, particularly for small or rapidly evolving structures. To address these challenges, several strategies leveraging the nnU-Net framework have been proposed, differing along four key axes: (i) the fingerprint dataset (adult, pediatric, or a combination thereof) from which the Training Plan -including the network architecture-is derived; (ii) the Learning Set (adult, pediatric, or mixed), (iii) Data Augmentation parameters, and (iv) the Transfer learning method (finetuning versus continual learning). In this work, we introduce PSAT (Pediatric Segmentation Approaches via Adult Augmentations and Transfer learning), a systematic study that investigates the impact of these axes on segmentation performance. We benchmark the derived strategies on two pediatric CT datasets and compare them with state-of-theart methods, including a commercial radiotherapy solution. PSAT highlights key pitfalls and provides actionable insights for improving pediatric segmentation.
Accurate diagnosis of pediatric brain tumors, starting with histopathology, presents unique challenges for deep learning, including severe data scarcity, class imbalance, and fine-grained morphologic overlap across diagnostically distinct subtypes. While pathology foundation models have advanced patch-level representation learning, their effective adaptation to weakly supervised pediatric brain tumor classification under limited data remains underexplored. In this work, we introduce an expert-guided contrastive fine-tuning framework for pediatric brain tumor diagnosis from whole-slide images (WSI). Our approach integrates contrastive learning into slide-level multiple instance learning (MIL) to explicitly regularize the geometry of slide-level representations during downstream fine-tuning. We propose both a general supervised contrastive setting and an expert-guided variant that incorporates clinically informed hard negatives targeting diagnostically confusable subtypes. Through comprehensive experiments on pediatric brain tumor WSI classification under realistic low-sample and class-imbalanced conditions, we demonstrate that contrastive fine-tuning yields measurable improvements i
Recent advancements in deep learning for Medical Artificial Intelligence have demonstrated that models can match the diagnostic performance of clinical experts in adult chest X-ray (CXR) interpretation. However, their application in the pediatric context remains limited due to the scarcity of large annotated pediatric image datasets. Additionally, significant challenges arise from the substantial variability in pediatric CXR images across different hospitals and the diverse age range of patients from 0 to 18 years. To address these challenges, we propose SCC, a novel approach that combines transfer learning with self-supervised contrastive learning, augmented by an unsupervised contrast enhancement technique. Transfer learning from a well-trained adult CXR model mitigates issues related to the scarcity of pediatric training data. Contrastive learning with contrast enhancement focuses on the lungs, reducing the impact of image variations and producing high-quality embeddings across diverse pediatric CXR images. We train SCC on one pediatric CXR dataset and evaluate its performance on two other pediatric datasets from different sources. Our results show that SCC's out-of-distribution
Artificial intelligence-enhanced electrocardiogram (AI-ECG) has shown promise as an inexpensive, ubiquitous, and non-invasive screening tool to detect left ventricular systolic dysfunction in pediatric congenital heart disease. However, current approaches rely heavily on large-scale labeled datasets, which poses a major obstacle to the democratization of AI in hospitals where only limited pediatric ECG data are available. In this work, we propose a robust training framework to improve AI-ECG performance under low-resource conditions. Specifically, we introduce an on-manifold adversarial perturbation strategy for pediatric ECGs to generate synthetic noise samples that better reflect real-world signal variations. Building on this, we develop an uncertainty-aware adversarial training algorithm that is architecture-agnostic and enhances model robustness. Evaluation on the real-world pediatric dataset demonstrates that our method enables low-cost and reliable detection of left ventricular systolic dysfunction, highlighting its potential for deployment in resource-limited clinical settings.
This study explores the integration of artificial intelligence (AI) or large language models (LLMs) into pediatric rehabilitation clinical documentation, focusing on the generation of SOAP (Subjective, Objective, Assessment, Plan) notes, which are essential for patient care. Creating complex documentation is time-consuming in pediatric settings. We evaluate the effectiveness of two AI tools; Copilot, a commercial LLM, and KAUWbot, a fine-tuned LLM developed for KidsAbility Centre for Child Development (an Ontario pediatric rehabilitation facility), in simplifying and automating this process. We focus on two key questions: (i) How does the quality of AI-generated SOAP notes based on short clinician summaries compare to human-authored notes, and (ii) To what extent is human editing necessary for improving AI-generated SOAP notes? We found no evidence of prior work assessing the quality of AI-generated clinical notes in pediatric rehabilitation. We used a sample of 432 SOAP notes, evenly divided among human-authored, Copilot-generated, and KAUWbot-generated notes. We employ a blind evaluation by experienced clinicians based on a custom rubric. Statistical analysis is conducted to asse
Pediatric dental segmentation is critical in dental diagnostics, presenting unique challenges due to variations in dental structures and the lower number of pediatric X-ray images. This study proposes a custom SegUNet model with a VGG19 backbone, designed explicitly for pediatric dental segmentation and applied to the Children's Dental Panoramic Radiographs dataset. The SegUNet architecture with a VGG19 backbone has been employed on this dataset for the first time, achieving state-of-the-art performance. The model reached an accuracy of 97.53%, a dice coefficient of 92.49%, and an intersection over union (IOU) of 91.46%, setting a new benchmark for this dataset. These results demonstrate the effectiveness of the VGG19 backbone in enhancing feature extraction and improving segmentation precision. Comprehensive evaluations across metrics, including precision, recall, and specificity, indicate the robustness of this approach. The model's ability to generalize across diverse dental structures makes it a valuable tool for clinical applications in pediatric dental care. It offers a reliable and efficient solution for automated dental diagnostics.
Development of effective treatments in pediatric population poses unique scientific and ethical challenges in addition to the small population. In this regard, both the U.S. and E.U. regulations suggest a complementary strategy, pediatric extrapolation, based on assessing the relevance of existing information in the adult population to the pediatric population. The pediatric extrapolation approach often relies on data extrapolation from adults, contingent upon evidence of similar disease progression, pharmacology and clinical response to treatment between adult and children. Similarity evaluation in pharmacology is usually characterized through the exposure-response relationship. Current methodologies for comparing exposure-response (E-R) curves between these groups are inadequate, typically focusing on isolated data points rather than the entire curve spectrum (Zhang et al., 2021). To overcome this limitation, we introduce an innovative Bayesian approach for a comprehensive evaluation of E-R curve similarities between adult and pediatric populations. This method encompasses the entire curve, employing logistic regression for binary endpoints. We have developed an algorithm to dete
Artificial intelligence systems that record voice and video during pediatric emergencies are emerging as human-computer interaction (HCI) technologies with direct implications for clinical work, promising improvements in documentation, team performance, and post-event debriefing. Yet the perspectives of those most affected, including clinicians, parents, and child patients, remain largely absent from the design and governance of these technologies. This position paper argues that this has direct consequences for the legitimacy and effectiveness of these systems. We examine four areas where these missing perspectives prove consequential (consent, emotional impact, surveillance dynamics, and participatory governance) and propose four positions for reorienting AI recording in pediatric emergency care toward stakeholder-centered HCI inquiry.
Pediatric pneumonia remains a leading cause of morbidity and mortality in children worldwide. Timely and accurate diagnosis is critical but often challenged by limited radiological expertise and the physiological and procedural complexity of pediatric imaging. This study investigates the performance of state-of-the-art convolutional neural network (CNN) architectures ResNetRS, RegNet, and EfficientNetV2 using transfer learning for the automated classification of pediatric chest Xray images as either pneumonia or normal.A curated subset of 1,000 chest X-ray images was extracted from a publicly available dataset originally comprising 5,856 pediatric images. All images were preprocessed and labeled for binary classification. Each model was fine-tuned using pretrained ImageNet weights and evaluated based on accuracy and sensitivity. RegNet achieved the highest classification performance with an accuracy of 92.4 and a sensitivity of 90.1, followed by ResNetRS (accuracy: 91.9, sensitivity: 89.3) and EfficientNetV2 (accuracy: 88.5, sensitivity: 88.1).
Developing intelligent pediatric consultation systems offers promising prospects for improving diagnostic efficiency, especially in China, where healthcare resources are scarce. Despite recent advances in Large Language Models (LLMs) for Chinese medicine, their performance is sub-optimal in pediatric applications due to inadequate instruction data and vulnerable training procedures. To address the above issues, this paper builds PedCorpus, a high-quality dataset of over 300,000 multi-task instructions from pediatric textbooks, guidelines, and knowledge graph resources to fulfil diverse diagnostic demands. Upon well-designed PedCorpus, we propose PediatricsGPT, the first Chinese pediatric LLM assistant built on a systematic and robust training pipeline. In the continuous pre-training phase, we introduce a hybrid instruction pre-training mechanism to mitigate the internal-injected knowledge inconsistency of LLMs for medical domain adaptation. Immediately, the full-parameter Supervised Fine-Tuning (SFT) is utilized to incorporate the general medical knowledge schema into the models. After that, we devise a direct following preference optimization to enhance the generation of pediatric
Pediatric chest X-ray imaging is essential for early diagnosis, particularly in low-resource settings where advanced imaging modalities are often inaccessible. Low-dose protocols reduce radiation exposure in children but introduce substantial noise that can obscure critical anatomical details. Conventional denoising methods often degrade fine details, compromising diagnostic accuracy. In this paper, we present SharpXR, a structure-aware dual-decoder U-Net designed to denoise low-dose pediatric X-rays while preserving diagnostically relevant features. SharpXR combines a Laplacian-guided edge-preserving decoder with a learnable fusion module that adaptively balances noise suppression and structural detail retention. To address the scarcity of paired training data, we simulate realistic Poisson-Gaussian noise on the Pediatric Pneumonia Chest X-ray dataset. SharpXR outperforms state-of-the-art baselines across all evaluation metrics while maintaining computational efficiency suitable for resource-constrained settings. SharpXR-denoised images improved downstream pneumonia classification accuracy from 88.8% to 92.5%, underscoring its diagnostic value in low-resource pediatric care.
Automated analysis of lung sound auscultation is essential for monitoring respiratory health, especially in regions facing a shortage of skilled healthcare workers. While respiratory sound classification has been widely studied in adults, its ap plication in pediatric populations, particularly in children aged <6 years, remains an underexplored area. The developmental changes in pediatric lungs considerably alter the acoustic proper ties of respiratory sounds, necessitating specialized classification approaches tailored to this age group. To address this, we propose a multistage hybrid CNN-Transformer framework that combines CNN-extracted features with an attention-based architecture to classify pediatric respiratory diseases using scalogram images from both full recordings and individual breath events. Our model achieved an overall score of 0.9039 in binary event classifi cation and 0.8448 in multiclass event classification by employing class-wise focal loss to address data imbalance. At the recording level, the model attained scores of 0.720 for ternary and 0.571 for multiclass classification. These scores outperform the previous best models by 3.81% and 5.94%, respectively. T
In many pediatric fMRI studies, cardiac signals are often missing or of poor quality. A tool to extract Heart Rate Variation (HRV) waveforms directly from fMRI data, without the need for peripheral recording devices, would be highly beneficial. We developed a machine learning framework to accurately reconstruct HRV for pediatric applications. A hybrid model combining one-dimensional Convolutional Neural Networks (1D-CNN) and Gated Recurrent Units (GRU) analyzed BOLD signals from 628 ROIs, integrating past and future data. The model achieved an 8% improvement in HRV accuracy, as evidenced by enhanced performance metrics. This approach eliminates the need for peripheral photoplethysmography devices, reduces costs, and simplifies procedures in pediatric fMRI. Additionally, it improves the robustness of pediatric fMRI studies, which are more sensitive to physiological and developmental variations than those in adults.
Demyelinating disorders of the central nervous system may have multiple causes, the most common are infections, autoimmune responses, genetic or vascular etiology. Demyelination lesions are characterized by areas were the myelin sheath of the nerve fibers are broken or destroyed. Among autoimmune disorders, Multiple Sclerosis (MS) is the most well-known Among these disorders, Multiple Sclerosis (MS) is the most well-known and aggressive form. Acute Disseminated Encephalomyelitis (ADEM) is another type of demyelinating disease, typically with a better prognosis. Magnetic Resonance Imaging (MRI) is widely used for diagnosing and monitoring disease progression by detecting lesions. While both adults and children can be affected, there is a significant lack of publicly available datasets for pediatric cases and demyelinating disorders beyond MS. This study introduces, for the first time, a publicly available pediatric dataset for demyelinating lesion segmentation. The dataset comprises MRI scans from 13 pediatric patients diagnosed with demyelinating disorders, including 3 with ADEM. In addition to lesion segmentation masks, the dataset includes extensive patient metadata, such as diag
Objective: Our study aimed to evaluate and validate PanSegNet, a deep learning (DL) algorithm for pediatric pancreas segmentation on MRI in children with acute pancreatitis (AP), chronic pancreatitis (CP), and healthy controls. Methods: With IRB approval, we retrospectively collected 84 MRI scans (1.5T/3T Siemens Aera/Verio) from children aged 2-19 years at Gazi University (2015-2024). The dataset includes healthy children as well as patients diagnosed with AP or CP based on clinical criteria. Pediatric and general radiologists manually segmented the pancreas, then confirmed by a senior pediatric radiologist. PanSegNet-generated segmentations were assessed using Dice Similarity Coefficient (DSC) and 95th percentile Hausdorff distance (HD95). Cohen's kappa measured observer agreement. Results: Pancreas MRI T2W scans were obtained from 42 children with AP/CP (mean age: 11.73 +/- 3.9 years) and 42 healthy children (mean age: 11.19 +/- 4.88 years). PanSegNet achieved DSC scores of 88% (controls), 81% (AP), and 80% (CP), with HD95 values of 3.98 mm (controls), 9.85 mm (AP), and 15.67 mm (CP). Inter-observer kappa was 0.86 (controls), 0.82 (pancreatitis), and intra-observer agreement rea
Pediatric appendicitis remains one of the most common causes of acute abdominal pain in children, and its diagnosis continues to challenge clinicians due to overlapping symptoms and variable imaging quality. This study aims to develop and evaluate a deep learning model based on a pretrained ResNet architecture for automated detection of appendicitis from ultrasound images. We used the Regensburg Pediatric Appendicitis Dataset, which includes ultrasound scans, laboratory data, and clinical scores from pediatric patients admitted with abdominal pain to Children Hospital. Hedwig in Regensburg, Germany. Each subject had 1 to 15 ultrasound views covering the right lower quadrant, appendix, lymph nodes, and related structures. For the image based classification task, ResNet was fine tuned to distinguish appendicitis from non-appendicitis cases. Images were preprocessed by normalization, resizing, and augmentation to enhance generalization. The proposed ResNet model achieved an overall accuracy of 93.44, precision of 91.53, and recall of 89.8, demonstrating strong performance in identifying appendicitis across heterogeneous ultrasound views. The model effectively learned discriminative sp
Pediatric brain tumor segmentation presents unique challenges due to the rarity and heterogeneity of these malignancies, yet remains critical for clinical diagnosis and treatment planning. We propose an ensemble approach integrating nnU-Net, Swin UNETR, and HFF-Net for the BraTS-PED 2025 challenge. Our method incorporates three key extensions: adjustable initialization scales for optimal nnU-Net complexity control, transfer learning from BraTS 2021 pre-trained models to enhance Swin UNETR's generalization on pediatric dataset, and frequency domain decomposition for HFF-Net to separate low-frequency tissue contours from high-frequency texture details. Our final ensemble framework combines nnU-Net ($γ=0.7$), fine-tuned Swin UNETR, and HFF-Net, achieving Dice scores of 62.7% (CC), 83.2% (ED), 72.9% (ET), 85.7% (NET), 91.8% (TC), and 92.6% (WT) on the unseen test dataset, respectively. Our proposed method achieves first place (rank 1st) in the BraTS 2025 Pediatric Brain Tumor Segmentation Challenge.