搜索 — ResearchTracker

Child growth failure (CGF), which includes underweight, wasting, and stunting, is among the factors most strongly associated with mortality and morbidity in children younger than 5 years worldwide. Poor height and bodyweight gain arise from a variety of biological and sociodemographic factors and are associated with increased vulnerability to infectious diseases. We used data from the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2023 to estimate CGF prevalence, the risk of infectious diseases associated with CGF, and the disease mortality, morbidity, and overall burden associated with CGF. In this analysis we estimated the all-cause and cause-specific (diarrhoea, lower respiratory tract infections, malaria, and measles) disability-adjusted life-years (DALYs) lost and mortality associated with stunting, wasting, underweight, and CGF in aggregate. We combined the burden associated with mild, moderate, and severe forms of CGF: stunting was defined as height-for-age Z scores (HAZ) less than -1, underweight was defined as weight-for-age Z scores (WAZ) less than -1, and wasting was defined as weight-for-height Z scores (WHZ) less than -1, according to WHO Child Growth Standards. Population-level continuous distributions of HAZ, WAZ, and WHZ were estimated for 2000 to 2023 using data from surveys, literature, and individual-level study data. The risk of incidence of, and mortality due to, diarrhoea, lower respiratory infections, malaria, and measles was separately estimated in a meta-regression framework from longitudinal cohort data for Z scores less than -1. Finally, fatal outcomes associated with these diseases were estimated with vital registration, verbal autopsy, and case-fatality data, while non-fatal outcomes were estimated with surveys as well as health-care utilisation and case reporting data. The exposure prevalence and relative risk estimates were from continuous distributions, allowing for direct assessment of the attributable fractions for mild, moderate, and severe stunting, underweight, wasting, and the combined impact of child growth failure within populations. All estimates were age-specific, sex-specific, geography-specific, and year-specific. We estimated that, in children younger than 5 years in 2023, CGF was associated with 79·4 million (95% uncertainty interval [UI] 47·0-106) DALYs lost and 880 000 (517 000-1 170 000) deaths. This represented 17·9% (10·6-23·8) of 444 million (434-457) total under-5 DALYs and 18·8% (11·1-25·0) of all 4·67 million (4·59-4·75) under-5 deaths. Compared to stunting (33·0 million [24·1-42·2] DALYs, 373 000 [272 000-477 000] deaths) and wasting (39·2 million [23·8-53·0] DALYs, 428 000 [256 000-583 000] deaths), childhood underweight was associated with the largest share of CGF-related disease burden: 52·2 million (21·9-75·1) DALYs and 573 000 (236 000-824 000) deaths in children younger than 5 years in 2023. CGF remains a leading factor associated with death and disability in children younger than 5 years, despite global attention and focused interventions to reduce the prevalence of associated CGF indicators. Our findings underscore the need for policies, strategies, and interventions that focus on all indicators of CGF to reduce its associated health burden. Gates Foundation.

Artificial Intelligence-Enabled Intelligent Sensory Systems for Quality Evaluation of Traditional Chinese Medicine: A Review of Electronic Nose, Electronic Tongue, and Machine Vision Approaches.

PubMed2026-03-30作者：Shi J, Wu J, Xu L

Traditional sensory evaluation of traditional Chinese medicine (TCM) and medicinal and food homologous products has long relied on human observation of appearance, color, aroma, and taste. However, this approach is highly subjective, difficult to quantify, and often lacks reproducibility across evaluators. Intelligent sensory systems, including the electronic nose, electronic tongue, and machine vision, provide objective and digitized sensory information for TCM quality evaluation. Nevertheless, these platforms generate high-dimensional and heterogeneous datasets, creating a strong demand for efficient artificial intelligence (AI)-based analytical tools. This review summarizes recent advances in the application of machine learning and deep learning methods, such as support vector machine, random forest, convolutional neural network, and long short-term memory networks, for intelligent sensory evaluation of TCM. Particular emphasis is placed on how AI supports feature extraction, pattern recognition, classification, regression, and multisource data fusion across electronic nose, electronic tongue, and machine vision systems. Representative applications in raw material authentication, geographical origin discrimination, processing monitoring, and quality grading are also discussed. In addition, the current challenges related to data standardization, sensor drift, model robustness, and interpretability are highlighted. Overall, this review provides an integrated overview of AI-enabled intelligent sensory technologies and clarifies their potential to advance TCM quality evaluation toward a more objective, efficient, and holistic framework.

Molecules (Basel, Switzerland)

查看原文 ↗

ArecaNet: Robust Facial Emotion Recognition via Assembled Residual Enhanced Cross-Attention Networks for Emotion-Aware Human-Computer Interaction.

PubMed2025-12-04作者：Kim J, Choi G

Recently, the convergence of advanced sensor technologies and innovations in artificial intelligence and robotics has highlighted facial emotion recognition (FER) as an essential component of human-computer interaction (HCI). Traditional FER studies based on handcrafted features and shallow machine learning have shown a limited performance, while convolutional neural networks (CNNs) have improved nonlinear emotion pattern analysis but have been constrained by local feature extraction. Vision transformers (ViTs) have addressed this by leveraging global correlations, yet both CNN- and ViT-based single networks often suffer from overfitting, single-network dependency, and information loss in ensemble operations. To overcome these limitations, we propose ArecaNet, an assembled residual enhanced cross-attention network that integrates multiple feature streams without information loss. The framework comprises (i) channel and spatial feature extraction via SCSESResNet, (ii) landmark feature extraction from specialized sub-networks, (iii) iterative fusion through residual enhanced cross-attention, (iv) final emotion classification from the fused representation. Our research introduces a novel approach by integrating pre-trained sub-networks specialized in facial recognition with an attention mechanism and our uniquely designed main network, which is optimized for size reduction and efficient feature extraction. The extracted features are fused through an iterative residual enhanced cross-attention mechanism, which minimizes information loss and preserves complementary representations across networks. This strategy overcomes the limitations of conventional ensemble methods, enabling seamless feature integration and robust recognition. The experimental results show that the proposed ArecaNet achieved accuracies of 97.0% and 97.8% using the public databases, FER-2013 and RAF-DB, which were 4.5% better than the existing state-of-the-art method, PAtt-Lite, for FER-2013 and 2.75% for RAF-DB, and achieved a new state-of-the-art accuracy for each database.

Sensors (Basel, Switzerland)

Object Detection, Recognition, Deep Learning, and the Universal Law of Generalization.

PubMed2026-02-27作者：Rustom FB, Sharma R, Öğmen H

Object detection and recognition are fundamental functions that play a significant role in the success of species. Because the appearance of an object exhibits large variability, the brain has to group these different stimuli under the same object identity, a process of generalization. Does the process of generalization follow some general principles, or is it an ad hoc bag of tricks? The universal law of generalization (ULoG) provides evidence that generalization follows similar properties across a variety of species and tasks. Here, we tested the hypothesis derived from ULoG that the internal representations underlying generalization reflect the natural properties of object detection and recognition in our environment rather than the specifics of the system solving these problems. Neural networks with universal-approximation capability have been successful in many object detection and recognition tasks; however, how these networks reach their decisions remains opaque. To provide a strong test for ecological validity, we used natural camouflage, which is nature's test bed for object detection and recognition. We trained a deep neural network with natural images of "clear" and "camouflaged" animals and examined the emerging internal representations. We extended ULoG to a realistic learning regime, with multiple consequential stimuli, and developed two methods to determine category prototypes. Our results show that with a proper choice of category prototypes, the generalization functions are monotone decreasing, similar to the generalization functions of biological systems. Critically, we show that camouflaged inputs are not represented randomly but rather systematically appear at the tail of the monotone decreasing functions. Our results support the hypothesis that the internal representations underlying generalization in object detection and recognition are shaped mainly by the properties of the ecological environment, even though different biological and artificial systems may generate these internal representations through drastically different learning and adaptation processes. Furthermore, the extended version of ULoG provides a tool to analyze how the system organizes its internal representations during learning as well as how it makes its decisions.

Neural computation

Repeated pattern detection on fabric: A survey and novel approach.

PubMed2026-01-01作者：Ayyurek E, Marcuzzo M, Zangari A

Modern textile industries frequently apply patterns, such as brand logos or motifs, in near-regular arrangements to create visually appealing products. Consequently, the application of computer vision for pattern recognition is highly valuable for automating production chains and reducing waste. In this work, we address the challenging task of automatically detecting repeating patterns on fabric images, accounting for real-world complexities such as variable lighting and intentional pattern variance. We begin with an in-depth literature review on repeated pattern detection, highlighting current trends, organizing them into a hierarchy of sub-tasks, and discussing the novelty of each paper. Subsequently, we propose a novel method to solve our specific instance of this problem, focusing on detecting patterns with sub-pixel accuracy. We conduct extensive experiments to compare its performance against several baselines from the literature. Our method can be applied with high precision to real-world problems without requiring training data, instead using an automatic calibration procedure with limited human supervision. On a small synthetic dataset, our method detects repeated patterns with a 96% recall rate and an average alignment error of less than 0.5 pixels in just a few seconds, making it competitive with all tested baselines. Finally, we release our dataset and the code for its generation to encourage further research in this area.

PloS one

查看原文 ↗

Artificial intelligence revolutionize food detection? Vision, olfaction and taste integrated with machine learning/deep learning in food detection.

PubMed2026-01-15作者：Wei C, Zhou L, Liang B

The rapid advancement of artificial intelligence (AI) is profoundly transforming the theoretical framework and technological paradigm of food detection. The study focuses on elucidating the underlying mechanisms of machine learning (ML)- and deep learning (DL)-based AI in feature extraction, pattern recognition, and decision feedback. AI algorithms have enabled perception systems such as computer vision, electronic nose, and electronic tongue to overcome their dependence on handcrafted features, achieving automatic learning and high-dimensional representation of complex signals, thereby significantly enhancing detection accuracy and robustness. In addition, this paper systematically analyzes the advantages and limitations of AI-empowered perception technologies and presents prospects for the potential applications of multimodal data fusion and large language models (LLMs). Finally, the study summarizes the major challenges that AI still faces in food detection and outlines potential directions for future development.

Food chemistry

查看原文 ↗

A Robust Framework for Coffee Bean Package Label Recognition: Integrating Image Enhancement with Vision-Language OCR Models.

PubMed2025-10-20作者：Le TT, Hwang Y, Kadiptya AY

Text recognition on coffee bean package labels is of great importance for product tracking and brand verification, but it poses a challenge due to variations in image quality, packaging materials, and environmental conditions. In this paper, we propose a pipeline that combines several image enhancement techniques and is followed by an Optical Character Recognition (OCR) model based on vision-language (VL) Qwen VL variants, conditioned by structured prompts. To facilitate the evaluation, we construct a coffee bean package image set containing two subsets, namely low-resolution (LRCB) and high-resolution coffee bean image sets (HRCB), enclosing multiple real-world challenges. These cases involve various packaging types (bottles and bags), label sides (front and back), rotation, and different illumination. To address the image quality problem, we design a dedicated preprocessing pipeline for package label situations. We develop and evaluate four Qwen-VL OCR variants with prompt engineering, which are compared against four baselines: DocTR, PaddleOCR, EasyOCR, and Tesseract. Extensive comparison using various metrics, including the Levenshtein distance, Cosine similarity, Jaccard index, Exact Match, BLEU score, and ROUGE scores (ROUGE-1, ROUGE-2, and ROUGE-L), proves significant improvements upon the baselines. In addition, the public POIE dataset validation test proves how well the framework can generalize, thus demonstrating its practicality and reliability for label recognition.

Sensors (Basel, Switzerland)

查看原文 ↗

Global burden of cancer in children and adolescents aged 0-19 years, 1990-2023: a systematic analysis for the Global Burden of Disease Study 2023.

PubMed2026-04-04作者：GBD 2023 Childhood Cancer Collaborators

Information on childhood cancer burden is crucial for effective cancer policy planning. Unfortunately, observed paediatric cancer data are not available in every country, and previous global burden estimates have not discretely reported several common cancers of childhood. We aimed to inform efforts to address childhood cancer burden globally by analysing results from the Global Burden of Diseases, Injuries, and Risk Factors Study (GBD) 2023, which now include nine additional cancer causes compared with previous GBD analyses. GBD 2023 data sources for cancer estimation included population-based cancer registries, vital registration systems, and verbal autopsies. For childhood cancers (defined as those occurring at ages 0-19 years), mortality was estimated using cancer-specific ensemble models and incidence was estimated using mortality estimates and modelled mortality-to-incidence ratios (MIRs). Years of life lost (YLLs) were estimated by multiplying age-specific cancer deaths by the standard life expectancy at the age of death. Prevalence was estimated using survival estimates modelled from MIRs and multiplied by sequelae-specific disability weights to estimate years lived with disability (YLDs). Disability-adjusted life-years (DALYs) were estimated as the sum of YLLs and YLDs. Estimates are presented globally and by geographical and resource groupings, and all estimates are presented with 95% uncertainty intervals (UIs). Globally, in 2023, there were an estimated 377 000 incident childhood cancer cases (95% UI 288 000-489 000), 144 000 deaths (131 000-162 000), and 11·7 million (10·7-13·2) DALYs due to childhood cancer. Deaths due to childhood cancer decreased by 27·0% (15·5-36·1) globally, from 197 000 (173 000-218 000) in 1990, but increased in the WHO African region by 55·6% (25·5-92·4), from 31 500 (24 900-38 500) to 49 000 (42 600-58 200) between 1990 and 2023. In 2023, age-standardised YLLs due to childhood cancer were inversely correlated with country-level Socio-demographic Index. Childhood cancer was the eighth-leading cause of childhood deaths and the ninth-leading cause of DALYs among all cancers in 2023. The percentage of DALYs due to uncategorised childhood cancers was reduced from 26·5% (26·5-26·5) in GBD 2017 to 10·5% (8·1-13·1) with the addition of the nine new cancer causes. Target cancers for the WHO Global Initiative for Childhood Cancer (GICC) comprised 47·3% (42·2-52·0) of global childhood cancer deaths in 2023. Global childhood cancer burden remains a substantial contributor to global childhood disease and cancer burden and is disproportionately weighted towards resource-limited settings. The estimation of additional cancer types relevant in childhood provides a step towards alignment with WHO GICC targets. Efforts to decrease global childhood cancer burden should focus on addressing the inequities in burden worldwide and support comprehensive improvements along the childhood cancer diagnosis and care continuum. St Jude Children's Research Hospital, Gates Foundation, and St Baldrick's Foundation.

Lancet (London, England)

Recurrent Processing Dynamics in Occluded Object Recognition Revealed by Electroencephalography and Deep Neural Networks.

PubMed2026-04-29作者：Li R, Liu Z, Yan S

The human visual system excels at recognizing occluded objects, yet the temporal dynamics of recurrent processing in this task remain unclear. Using high-temporal-resolution Electroencephalography (EEG), backward masking, and deep neural networks (DNNs), we employed a two-stage paradigm to investigate recurrent processing in occluded object recognition. In Experiment 1, we manipulated occlusion levels and applied multivariate pattern analysis (MVPA) and temporal generalization analysis (TGA) to investigate the neural differences in object recognition across varying degrees of occlusion. In Experiment 2, backward masking was used to dissociate feedforward and recurrent contributions, assessed via representational similarity analysis (RSA). Results revealed a distinct shift in processing mechanisms: While low occlusion primarily relied on a rapid feedforward sweep, higher occlusion necessitated the recruitment of additional processing. Further characterization of this processing based on TGA and RSA under mask conditions revealed a two-stage recurrent process: An early stage (200-300[Formula: see text]ms) associated with low-level features, and a late stage (300-500[Formula: see text]ms) involved mid- and high-level representations, reflecting cross-hierarchical recurrent interactions. The early mask condition disrupted this coordination, highlighting the essential role of recurrent processing. These findings clarify the temporal dynamics of recurrent processing in occluded object recognition and emphasize the critical role of recurrence in achieving robust biological vision.

International journal of neural systems

查看原文 ↗

A preliminary investigation into the classification of wipe and swipe bloodstain patterns between human and artificial intelligence.

PubMed2026-03-01作者：Griffiths G, Parker DJ

Bloodstain pattern types, such as wipes and swipes, are frequently encountered at crime scenes and can offer critical insight into the sequence of events. However, these pattern types can be difficult to reliably distinguish, highlighting the need for modern, objective approaches to classification that reduce the potential for human error. In this study, 50 participants were asked to classify 40 test bloodstain pattern images (20 wipes and 20 swipes). These same images were subsequently classified using Microsoft Azure Custom Vision (MACV), an artificial intelligence (AI) image recognition platform. The MACV model was trained using 5425 bloodstain pattern images, including impact, expirated, cessation cast-off, wipe, and swipe stains, across a range of background colors. At the 50th training iteration, the AI achieved 100% accuracy in classifying both wipe and swipe patterns, outperforming participants who achieved an average accuracy of 52% (47% for wipes and 57% for swipes), marking a 48% improvement in classification performance. The model was further trained to the 80th iteration using rotated images, achieving 98.75% accuracy on the rotated test set.

Journal of forensic sciences

查看原文 ↗

Dynamic spatiotemporal features in action recognition: a multimodal study.

PubMed2026-04-03作者：Jin Q, Cui D, Nelissen K

Recognizing and distinguishing actions is a complex cognitive process that relies on integrating various spatiotemporal information. However, the specific contributions of spatial and temporal features to action recognition remain unclear. To address this gap, we conducted fMRI recordings in monkeys as they observed videos of grasping, touching, and reaching actions. Using multivariate pattern analysis (MVPA), we identified distinct action representation patterns across the brain, with most regions of the action observation network (AON) exhibiting a grasping-dominant pattern. This neural representation was consistent with the monkeys' behavioral differentiation of these actions in subsequent categorization tasks. Building on computer vision approaches, we systematically extracted dynamic spatial and temporal features from action videos, capturing evolution of feature information over time, and compared these features with the monkeys' behavioral performance. Our results demonstrate that these features are utilized across a hierarchy and selectively correlate with behavior, reflecting a complex interplay between feature information and key action components. These findings imply a distributed coding strategy in which diverse spatial and temporal features are selectively integrated to form action representations that facilitate recognition or discrimination. Our study provides empirical evidence for current action recognition models and introduces advanced computational tools for analyzing high-dimensional and multimodal data.

Communications biology

查看原文 ↗

Improving Food Image Recognition with Noisy Vision Transformer.

PubMed2025-07-01作者：Ghosh T, Sazonov E

Food image recognition is a challenging task in computer vision due to the high variability and complexity of food images. In this study, we investigate the potential of Noisy Vision Transformers (NoisyViT) for improving food classification performance. By introducing noise into the learning process, NoisyViT reduces task complexity and adjusts the entropy of the system, leading to enhanced model accuracy. We fine-tune NoisyViT on three benchmark datasets: Food2K (2,000 categories, ~1M images), Food-101 (101 categories, ~100K images), and CNFOOD-241 (241 categories, ~190K images). The performance of NoisyViT is evaluated against state-of-the-art food recognition models. Our results demonstrate that NoisyViT achieves Top-1 accuracies of 95%, 99.5%, and 96.6% on Food2K, Food-101, and CNFOOD-241, respectively, significantly outperforming existing approaches. This study underscores the potential of NoisyViT for dietary assessment, nutritional monitoring, and healthcare applications, paving the way for future advancements in vision-based food computing. Code for NoisyViT for food recognition is publicly available.

Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference

查看原文 ↗

Computer-aided characterization of the arrhythmogenic substrate after myocardial infarction.

PubMed2026-02-03作者：Kloosterman M, Smits KC, Stoks J

Ventricular tachycardia (VT) and ventricular fibrillation remain major contributors to sudden cardiac death, with current therapies limited by our incomplete understanding of the arrhythmogenic substrate. This narrative review explores recent developments in computer-aided techniques for characterizing the arrhythmogenic substrate, focusing on post-myocardial infarction VT. High-resolution cardiac imaging now enables detailed visualization of structural abnormalities, including heterogeneous scar architecture and fatty infiltration. Sophisticated invasive mapping techniques provide insights into local electrophysiological properties, while novel non-invasive mapping approaches offer complementary views of global electrical patterns. Integration of these modalities through computational simulations allows for mechanistic insights into arrhythmia initiation and maintenance, particularly in post-myocardial infarction VT, where structural and functional substrates interact in complex ways. Emerging artificial intelligence applications enhance substrate analysis through automated feature extraction and pattern recognition, enabling more sophisticated risk stratification. These computer-aided approaches are advancing from research tools to clinical applications, with early evidence suggesting improved ablation outcomes and better risk prediction. However, significant challenges remain in validation, standardization, and clinical implementation of these innovations. This narrative review highlights recent methodological advances and clinical applications of computer-aided substrate characterization, and conceptualizes future directions towards personalized arrhythmia management, also beyond post-infarction VTs.

Europace : European pacing, arrhythmias, and cardiac electrophysiology : journal of the working groups on cardiac pacing, arrhythmias, and cardiac cellular electrophysiology of the European Society of Cardiology

Radar-Based Gesture Recognition Using Adaptive Top-K Selection and Multi-Stream CNNs.

PubMed2025-10-13作者：Park J, Jeong J

With the proliferation of the Internet of Things (IoT), gesture recognition has attracted attention as a core technology in human-computer interaction (HCI). In particular, mmWave frequency-modulated continuous-wave (FMCW) radar has emerged as an alternative to vision-based approaches due to its robustness to illumination changes and advantages in privacy. However, in real-world human-machine interface (HMI) environments, hand gestures are inevitably accompanied by torso- and arm-related reflections, which can also contain gesture-relevant variations. To effectively capture these variations without discarding them, we propose a preprocessing method called Adaptive Top-K Selection, which leverages vector entropy to summarize and preserve informative signals from both hand and body reflections. In addition, we present a Multi-Stream EfficientNetV2 architecture that jointly exploits temporal range and Doppler trajectories, together with radar-specific data augmentation and a training optimization strategy. In experiments on the publicly available FMCW gesture dataset released by the Karlsruhe Institute of Technology, the proposed method achieved an average accuracy of 99.5%. These results show that the proposed approach enables accurate and reliable gesture recognition even in realistic HMI environments with co-existing body reflections.

Sensors (Basel, Switzerland)

查看原文 ↗

Bridging Species with AI: A Cross-Species Deep Learning Model for Fracture Detection and Beyond.

PubMed2026-02-13作者：Ahmed HT, Berner D, Zhang Q

Fractures are a leading cause of morbidity and mortality in Thoroughbred racehorses, posing a significant threat to their welfare and careers. This study introduces a deep learning model specifically designed to facilitate fracture detection in equine athletes. By leveraging extensive training on human fracture data and refining the model with equine imaging, it highlights the transformative potential of transfer learning across species and medical contexts. This approach is not limited to equine fractures but could be adapted for use in detecting injuries or conditions in other veterinary species and even human healthcare applications. A comprehensive databank of radiographs, sourced from public archives and equine hospitals, was curated to encompass diverse conditions (fracture and non-fracture), ensuring robust pattern recognition. The architecture integrates a Vision Transformer for global context modelling with a ResNet backbone and loss function to optimize local feature extraction and cross-species adaptability. The pipeline achieved 96.7% accuracy for modality classification, 97.2% accuracy for projection recognition, and fracture localization intersection over union values of 0.71-0.84 across equine datasets. This work bridges advancements in human and veterinary medicine, opening pathways for AI-driven solutions that extend beyond fractures, fostering improved diagnostic precision and broader applications across species (felines, canines, etc.). By integrating advanced imaging techniques with AI, this study aims to set a foundation for more comprehensive and versatile health monitoring systems.

Bioengineering (Basel, Switzerland)

查看原文 ↗

Pose-Based Static Sign Language Recognition with Deep Learning for Turkish, Arabic, and American Sign Languages.

PubMed2026-01-13作者：Yayla R, Üçgün H, Abbas M

Advancements in artificial intelligence have significantly enhanced communication for individuals with hearing impairments. This study presents a robust cross-lingual Sign Language Recognition (SLR) framework for Turkish, American English, and Arabic sign languages. The system utilizes the lightweight MediaPipe library for efficient hand landmark extraction, ensuring stable and consistent feature representation across diverse linguistic contexts. Datasets were meticulously constructed from nine public-domain sources (four Arabic, three American, and two Turkish). The final training data comprises curated image datasets, with frames for each language carefully selected from varying angles and distances to ensure high diversity. A comprehensive comparative evaluation was conducted across three state-of-the-art deep learning architectures-ConvNeXt (CNN-based), Swin Transformer (ViT-based), and Vision Mamba (SSM-based)-all applied to identical feature sets. The evaluation demonstrates the superior performance of contemporary vision Transformers and state space models in capturing subtle spatial cues across diverse sign languages. Our approach provides a comparative analysis of model generalization capabilities across three distinct sign languages, offering valuable insights for model selection in pose-based SLR systems.

Sensors (Basel, Switzerland)

查看原文 ↗

Vision transformer-based diagnosis of psoriasis and eczema in whole-slide histology.

PubMed2026-02-20作者：Rivera Monroy LC, Petzold A, Sari M

Psoriasis and eczema are chronic inflammatory skin diseases with overlapping histopathological features, which often lead to diagnostic uncertainty even among experienced dermatopathologists. To address this challenge, we developed a computer-assisted diagnostic framework that combines the Virchow foundation model, pretrained on 1.5 million whole-slide images, with multi-instance learning (MIL) to classify psoriasis and eczema from digitized histopathology slides. Using an internal dataset (n = 40) and an external validation cohort (n = 40), equally balanced between both conditions and annotated by board-certified dermatopathologists, our best-performing configuration (Virchow + CLAM) achieved 85% accuracy, a macro-averaged F1 score of 0.80, and an AUC of 0.81 on the external cohort. This substantially outperformed baseline convolutional neural networks, which reached 61% accuracy, and models relying solely on pretrained feature extractors without MIL, which achieved an average accuracy of 68.8%. In a reader study on the same external cohort, individual dermatopathologist accuracies ranged from 47.5 to 70.0%, with a majority-vote consensus accuracy of 62.5%; our method outperformed both the average individual reader and the consensus under histology-only conditions. Furthermore, the model generates attention heatmaps that provide supportive visual context by highlighting regions associated with model predictions. Importantly, this study is designed as a methodological proof-of-concept conducted under controlled, histology-only conditions and is not intended for direct clinical deployment. Rather than demonstrating clinical readiness, it illustrates the potential of domain-specific foundation models combined with MIL for addressing diagnostically challenging inflammatory dermatoses.

Virchows Archiv : an international journal of pathology

Attention to detail: A conditional multi-head transformer for traffic sign recognition.

PubMed2025-01-01作者：Naz I, Shah JH, Tahir A

The challenge of traffic sign detection and recognition for driving vehicles has become more critical with recent advances in autonomous and assisted driving technologies. Although object recognition problems, particularly traffic sign recognition, have been extensively studied, most Vision Transformer (ViT) models still rely on static attention mechanisms with fixed projection matrices (Q, K, and V). Using this mechanism limits the ViTs to handle real-world problems such as object detection and traffic sign recognition, etc. Problems, such as partially or fully obscured signs, changes in illumination, and weather conditions, result in subpar feature extraction, which compounds the misclassification problem. To overcome this challenge, a Conditional Visual Transformer (CViT) is proposed in this research, which dynamically adapts feature aggregation, Q, K, and V projections, as well as attention-based mechanisms, based on the input sign type. Its main component consists of a controlled failure deep learning model using a CViT that targets specific types of traffic signs through varying feature extraction and attention adjustments, resulting in high classification performance and minimizing misclassifications. Furthermore, an adaptive gating technique is employed that optimally adjusts the projection matrix across different traffic signs. The proposed CViT achieved an overall accuracy of 99.87%, with a Micro Precision of 99.07%, a Macro Recall of 94.3%, and a Macro F1 Score of 99.07%, respectively. These results demonstrate the potential of CViT to improve both the efficiency and reliability of traffic sign recognition in autonomous driving applications.

PloS one

查看原文 ↗

Hierarchical graph-guided contextual representation learning for Neurodegenerative pattern recognition in MRI.

PubMed2025-12-01作者：Venkatraman S, P R JD, Kavitha MS

Neurodegenerative (ND) diseases are autoimmune diseases that affect the central nervous system, including the brain and spinal cord. In recent years, deep learning has demonstrated its potential in medical imaging for diagnostic purposes. However, for these techniques to be fully accepted in clinical settings, they must achieve high performance and gain the confidence of medical professionals regarding their interpretability. Therefore, an interpretable model should make decisions based on clinically relevant information like a domain expert. To achieve this, we present an interpretable classifier dedicated to the most common autoimmune ND diseases. The lesions associated with ND diseases exhibit irregular distributions and spatial dependencies in different regions of the brain, challenging traditional models to effectively capture both local and global relationships. To address this issue, we present a Residual Graph Neural Network enhanced Vision Transformer (RG-ViT) that represents MRI data as a graph of interconnected patches. By integrating residual connections into the GNN framework, we preserve critical features while promoting effective message passing. This approach overcomes the problem of spatial disconnection prevalent in standard patch-based methods and provides a cohesive and context-aware analysis of MRI data. Experimental results in detecting multiple sclerosis (MS), Parkinson's (PD), and Alzheimer's disease (AD) demonstrated our approach's consistent accuracy scores of 98.7%, 99.6%, and 99.1%, respectively. On the combined dataset for the global classification of ND diseases, it achieved an F1 score of 99.2%, justifying its generalizability.

Computers in biology and medicine

查看原文 ↗

Handwritten Digit Recognition with Flood Simulation and Topological Feature Extraction.

PubMed2025-11-29作者：Brociek R, Pleszczyński M, Błaszczyk J

This paper introduces a novel approach to handwritten digit recognition based on directional flood simulation and topological feature extraction. While traditional pixel-based methods often struggle with noise, partial occlusion, and limited data, our method leverages the structural integrity of digits by simulating water flow from image boundaries using a modified breadth-first search (BFS) algorithm. The resulting flooded regions capture stroke directionality, spatial segmentation, and closed-area characteristics, forming a compact and interpretable feature vector. Additional parameters such as inner cavities, perimeter estimation, and normalized stroke density enhance classification robustness. For efficient prediction, we employ the Annoy approximate nearest neighbors algorithm using ensemble-based tree partitioning. The proposed method achieves high accuracy on the MNIST (95.9%) and USPS (93.0%) datasets, demonstrating resilience to rotation, noise, and limited training data. This topology-driven strategy enables accurate digit classification with reduced dimensionality and improved generalization.

Entropy (Basel, Switzerland)

查看原文 ↗

搜索结果：Computer Vision and Pattern Recognition

Quantifying the fatal and non-fatal burden of disease associated with child growth failure, 2000-2023: a systematic analysis from the Global Burden of Disease Study 2023.

Artificial Intelligence-Enabled Intelligent Sensory Systems for Quality Evaluation of Traditional Chinese Medicine: A Review of Electronic Nose, Electronic Tongue, and Machine Vision Approaches.

ArecaNet: Robust Facial Emotion Recognition via Assembled Residual Enhanced Cross-Attention Networks for Emotion-Aware Human-Computer Interaction.

Object Detection, Recognition, Deep Learning, and the Universal Law of Generalization.

Repeated pattern detection on fabric: A survey and novel approach.

Artificial intelligence revolutionize food detection? Vision, olfaction and taste integrated with machine learning/deep learning in food detection.

A Robust Framework for Coffee Bean Package Label Recognition: Integrating Image Enhancement with Vision-Language OCR Models.

Global burden of cancer in children and adolescents aged 0-19 years, 1990-2023: a systematic analysis for the Global Burden of Disease Study 2023.

Recurrent Processing Dynamics in Occluded Object Recognition Revealed by Electroencephalography and Deep Neural Networks.

A preliminary investigation into the classification of wipe and swipe bloodstain patterns between human and artificial intelligence.

Dynamic spatiotemporal features in action recognition: a multimodal study.

Improving Food Image Recognition with Noisy Vision Transformer.

Computer-aided characterization of the arrhythmogenic substrate after myocardial infarction.

Radar-Based Gesture Recognition Using Adaptive Top-K Selection and Multi-Stream CNNs.

Bridging Species with AI: A Cross-Species Deep Learning Model for Fracture Detection and Beyond.

Pose-Based Static Sign Language Recognition with Deep Learning for Turkish, Arabic, and American Sign Languages.

Vision transformer-based diagnosis of psoriasis and eczema in whole-slide histology.

Attention to detail: A conditional multi-head transformer for traffic sign recognition.

Hierarchical graph-guided contextual representation learning for Neurodegenerative pattern recognition in MRI.

Handwritten Digit Recognition with Flood Simulation and Topological Feature Extraction.