Rare diseases, including Inborn Errors of Metabolism (IEM), pose significant diagnostic challenges. Case reports serve as key but computationally underutilized resources to inform diagnosis. Clinical dense information extraction refers to organizing medical information into structured predefined categories. Large Language Models (LLMs) may enable scalable information extraction from case reports but are rarely evaluated for this task. We introduce CaseReportBench, an expert-annotated dataset for dense information extraction of case reports, focusing on IEMs. Using this dataset, we assess various models and prompting strategies, introducing novel approaches such as category-specific prompting and subheading-filtered data integration. Zero-shot chain-of-thought prompting offers little advantage over standard zero-shot prompting. Category-specific prompting improves alignment with the benchmark. The open-source model Qwen2.5-7B outperforms GPT-4o for this task. Our clinician evaluations show that LLMs can extract clinically relevant details from case reports, supporting rare disease diagnosis and management. We also highlight areas for improvement, such as LLMs' limitations in recogni
In 2004, Dai, Lathrop, Lutz, and Mayordomo defined and investigated the finite-state dimension (a finite-state version of algorithmic dimension) of a sequence $S \in Σ^\infty$ and, in 2018, Case and Lutz defined and investigated the mutual (algorithmic) dimension between two sequences $S \in Σ^\infty$ and $T \in Σ^\infty$. In this paper, we propose a definition for the lower and upper finite-state mutual dimensions $mdim_{FS}(S:T)$ and $Mdim_{FS}(S:T)$ between two sequences $S \in Σ^\infty$ and $T \in Σ^\infty$ over an alphabet $Σ$. Intuitively, the finite-state dimension of a sequence $S \in Σ^\infty$ represents the density of finite-state information contained within $S$, while the finite-state mutual dimension between two sequences $S \in Σ^\infty$ and $T \in Σ^\infty$ represents the density of finite-state information shared by $S$ and $T$. Thus ``finite-state mutual dimension'' can be viewed as a ``finite-state'' version of mutual dimension and as a ``mutual'' version of finite-state dimension. The main results of this investigation are as follows. First, we show that finite-state mutual dimension, defined using information-lossless finite-state compressors, has all of the pro
Timely identification of issue reports reflecting software vulnerabilities is crucial, particularly for Internet-of-Things (IoT) where analysis is slower than non-IoT systems. While Machine Learning (ML) and Large Language Models (LLMs) detect vulnerability-indicating issues in non-IoT systems, their IoT use remains unexplored. We are the first to tackle this problem by proposing two approaches: (1) combining ML and LLMs with Natural Language Processing (NLP) techniques to detect vulnerability-indicating issues of 21 Eclipse IoT projects and (2) fine-tuning a pre-trained BERT Masked Language Model (MLM) on 11,000 GitHub issues for classifying \vul. Our best performance belongs to a Support Vector Machine (SVM) trained on BERT NLP features, achieving an Area Under the receiver operator characteristic Curve (AUC) of 0.65. The fine-tuned BERT achieves 0.26 accuracy, emphasizing the importance of exposing all data during training. Our contributions set the stage for accurately detecting IoT vulnerabilities from issue reports, similar to non-IoT systems.
Infectious disease forecasting is of great interest to the public health community and policymakers, since forecasts can provide insight into disease dynamics in the near future and inform interventions. Due to delays in case reporting, however, forecasting models may often underestimate the current and future disease burden. In this paper, we propose a general framework for addressing reporting delay in disease forecasting efforts with the goal of improving forecasts. We propose strategies for leveraging either historical data on case reporting or external internet-based data to estimate the amount of reporting error. We then describe several approaches for adapting general forecasting pipelines to account for under- or over-reporting of cases. We apply these methods to address reporting delay in data on dengue fever cases in Puerto Rico from 1990 to 2009 and to reports of influenza-like illness (ILI) in the United States between 2010 and 2019. Through a simulation study, we compare method performance and evaluate robustness to assumption violations. Our results show that forecasting accuracy and prediction coverage almost always increase when correction methods are implemented to
We report results of the CASE 2022 Shared Task 1 on Multilingual Protest Event Detection. This task is a continuation of CASE 2021 that consists of four subtasks that are i) document classification, ii) sentence classification, iii) event sentence coreference identification, and iv) event extraction. The CASE 2022 extension consists of expanding the test data with more data in previously available languages, namely, English, Hindi, Portuguese, and Spanish, and adding new test data in Mandarin, Turkish, and Urdu for Sub-task 1, document classification. The training data from CASE 2021 in English, Portuguese and Spanish were utilized. Therefore, predicting document labels in Hindi, Mandarin, Turkish, and Urdu occurs in a zero-shot setting. The CASE 2022 workshop accepts reports on systems developed for predicting test data of CASE 2021 as well. We observe that the best systems submitted by CASE 2022 participants achieve between 79.71 and 84.06 F1-macro for new languages in a zero-shot setting. The winning approaches are mainly ensembling models and merging data in multiple languages. The best two submissions on CASE 2021 data outperform submissions from last year for Subtask 1 and Su
Vision Language Models (VLMs) have shown promising capabilities in medical image analysis by jointly understanding visual and textual information for tasks such as Visual Question Answering. However, existing hematology vision-language resources remain predominantly English centric, limiting their applicability in multilingual healthcare environments. This challenge is releveant generally to South Asia and specifically to Pakistan, where Urdu is widely used despite healthcare information and digital medical systems being largely dependent on English. To investigate this gap, we conducted a survey among healthcare professionals, which revealed substantial language mismatches between clinical documentation and patient communication, emphasizing the need for multilingual healthcare technologies. To address this limitation, we introduce WBCMor VQA, a clinically validated bilingual English, Urdu morphology aware VQA benchmark for leukemia and normal white blood cell analysis. The benchmark is constructed using morphology-aware annotations from LeukemiaAttri and WBCAtt datasets and supported by a domain specific Urdu hematology dictionary to ensure linguistic consistency and clinical cor
Background: A large number of neurology case reports have been published, but it is a challenging task for human medical experts to explore all of these publications. Text mining offers a computational approach to investigate neurology literature and capture meaningful patterns. The overarching goal of this study is to provide a new perspective on case reports of neurological disease and syndrome analysis over the last six decades using text mining. Methods: We extracted diseases and syndromes (DsSs) from more than 65,000 neurology case reports from 66 journals in PubMed over the last six decades from 1955 to 2017. Text mining was applied to reports on the detected DsSs to investigate high-frequency DsSs, categorize them, and explore the linear trends over the 63-year time frame. Results: The text mining methods explored high-frequency neurologic DsSs and their trends and the relationships between them from 1955 to 2017. We detected more than 18,000 unique DsSs and found 10 categories of neurologic DsSs. While the trend analysis showed the increasing trends in the case reports for top-10 high-frequency DsSs, the categories had mixed trends. Conclusion: Our study provided new insigh
In hematology, computational models offer significant potential to improve diagnostic accuracy, streamline workflows, and reduce the tedious work of analyzing single cells in peripheral blood or bone marrow smears. However, clinical adoption of computational models has been hampered by the lack of generalization due to large batch effects, small dataset sizes, and poor performance in transfer learning from natural images. To address these challenges, we introduce DinoBloom, the first foundation model for single cell images in hematology, utilizing a tailored DINOv2 pipeline. Our model is built upon an extensive collection of 13 diverse, publicly available datasets of peripheral blood and bone marrow smears, the most substantial open-source cohort in hematology so far, comprising over 380,000 white blood cell images. To assess its generalization capability, we evaluate it on an external dataset with a challenging domain shift. We show that our model outperforms existing medical and non-medical vision models in (i) linear probing and k-nearest neighbor evaluations for cell-type classification on blood and bone marrow smears and (ii) weakly supervised multiple instance learning for ac
Hematology analyzers are essential diagnostic and monitoring tools for detecting blood diseases. Although contemporary analyzers produce only basic insights, they are often not as detailed as required under the personalized medicine paradigm. Next-Generation Hematology Analyzers (NGHAs) are revolutionary newcomers in the field, with significant advantages over regular hematology analyzers. They provide deeper insights into cellular morphology, function, and genetic profiles. This detailed information opens up possibilities for tailor-made diagnostic and therapeutic approaches in precision medicine. This review presents some revolutionary technologies that have changed hematology analyzers and provides an overview of their limitations, basic functions, and influence on clinical practice. It focuses on the integration of state-of-the-art technologies, such as microfluidics, advanced optics, artificial intelligence, flow cytometry, and digital imaging, empowering NGHAs to improve diagnostic accuracy, rapidly detect diseases, and support flexible, targeted therapy. Hints regarding point-of-care hematology testing are also provided to discuss its implications for transforming healthcare
Analyzing large volumes of case law to uncover evolving legal principles, across multiple cases, on a given topic is a demanding task for legal professionals. Structured topical reports provide an effective solution by summarizing key issues, principles, and judgments, enabling comprehensive legal analysis on a particular topic. While prior works have advanced query-based individual case summarization, none have extended to automatically generating multi-case structured reports. To address this, we introduce LexGenie, an automated LLM-based pipeline designed to create structured reports using the entire body of case law on user-specified topics within the European Court of Human Rights jurisdiction. LexGenie retrieves, clusters, and organizes relevant passages by topic to generate a structured outline and cohesive content for each section. Expert evaluation confirms LexGenie's utility in producing structured reports that enhance efficient, scalable legal analysis.
Traditional health authority approval for oncology drugs is based on a clinical benefit endpoint, or a valid surrogate. In 1992 the FDA created the Accelerated Approval pathway to allow for earlier approval of therapies in serious conditions with an unmet medical need. This is accomplished typically by granting accelerated approval based on a surrogate endpoint that can be measured earlier than a traditional approval endpoint. Minimal residual disease (MRD) is a sensitive measure of residual cancer cells in hematology oncology after treatment, and is increasingly considered as a secondary or exploratory endpoint due to its prognostic potential for traditional clinical trial endpoints such as progression-free survival (PFS) and overall survival (OS). This work aims to evaluate MRD's surrogacy potential across several hematologic cancer indications while keeping the focus on follicular lymphoma (FL), using data from published studies. We examine individual-level and trial-level correlations extracted from previously published studies to elucidate the potential role of MRD in accelerating the drug approval process in hematology oncology trials.
A set $S\subseteq V$ is a dominating set of $G$ if every vertex in $V - S$ is adjacent to at least one vertex in $S$. The domination number $γ(G)$ of $G$ equals the minimum cardinality of a dominating set $S$ in $G$; we say that such a set $S$ is a $γ$-set. A generalization of this is partial domination which was introduced in 2017 by Case, Hedetniemi, Laskar, and Lipman [3,2] . In partial domination a set $S$ is a $p$-dominating set if it dominates a proportion $p$ of the vertices in $V$. The p-domination number $γ_{p}(G)$ is the minimum cardinality of a $p$-dominating set in $G$. In this paper, we investigate further properties of partial dominating sets, particularly ones related to graph products and locating partial dominating sets. We also introduce the concept of a $p$-influencing set as the union of all $p$-dominating sets for a fixed $p$ and investigate some of its properties.
In the face of an infectious disease, a key epidemiological measure is the basic reproduction number, which quantifies the average secondary infections caused by a single case in a susceptible population. In practice, the effective reproduction number, denoted as $R_t$, is widely used to assess the transmissibility of the disease at a given time $t$. Real-time estimating this metric is vital for understanding and managing disease outbreaks. Traditional statistical inference often relies on two assumptions. One is that samples are assumed to be drawn from a homogeneous population distribution, neglecting significant variations in individual transmission rates. The other is the ideal case reporting assumption, disregarding time delays between infection and reporting. In this paper, we thoroughly investigate these critical factors and assess their impact on estimating $R_t$. We first introduce negative binomial and Weibull distributions to characterize transmission rates and reporting delays, respectively, based on which observation and state equations are formulated. Then, we employ a Bayesian filtering for estimating $R_t$. Finally, validation using synthetic and empirical data demo
Purpose: We investigated the utilization of privacy-preserving, locally-deployed, open-source Large Language Models (LLMs) to extract diagnostic information from free-text cardiovascular magnetic resonance (CMR) reports. Materials and Methods: We evaluated nine open-source LLMs on their ability to identify diagnoses and classify patients into various cardiac diagnostic categories based on descriptive findings in 109 clinical CMR reports. Performance was quantified using standard classification metrics including accuracy, precision, recall, and F1 score. We also employed confusion matrices to examine patterns of misclassification across models. Results: Most open-source LLMs demonstrated exceptional performance in classifying reports into different diagnostic categories. Google's Gemma2 model achieved the highest average F1 score of 0.98, followed by Qwen2.5:32B and DeepseekR1-32B with F1 scores of 0.96 and 0.95, respectively. All other evaluated models attained average scores above 0.93, with Mistral and DeepseekR1-7B being the only exceptions. The top four LLMs outperformed our board-certified cardiologist (F1 score of 0.94) across all evaluation metrics in analyzing CMR reports.
Sparse autoencoders (SAEs) emerged as a promising tool for mechanistic interpretability of transformer-based foundation models. Very recently, SAEs were also adopted for the visual domain, enabling the discovery of visual concepts and their patch-wise attribution to tokens in the transformer model. While a growing number of foundation models emerged for medical imaging, tools for explaining their inferences are still lacking. In this work, we show the applicability of SAEs for hematology. We propose CytoSAE, a sparse autoencoder which is trained on over 40,000 peripheral blood single-cell images. CytoSAE generalizes to diverse and out-of-domain datasets, including bone marrow cytology, where it identifies morphologically relevant concepts which we validated with medical experts. Furthermore, we demonstrate scenarios in which CytoSAE can generate patient-specific and disease-specific concepts, enabling the detection of pathognomonic cells and localized cellular abnormalities at the patch level. We quantified the effect of concepts on a patient-level AML subtype classification task and show that CytoSAE concepts reach performance comparable to the state-of-the-art, while offering exp
Leukocyte differential test is a widely performed clinical procedure for screening infectious diseases. Existing hematology analyzers require labor-intensive work and a panel of expensive reagents. Here we report an artificial-intelligence enabled reagent-free imaging hematology analyzer (AIRFIHA) modality that can accurately classify subpopulations of leukocytes with minimal sample preparation. AIRFIHA is realized through training a two-step residual neural network using label-free images of separated leukocytes acquired from a custom-built quantitative phase microscope. We validated the performance of AIRFIHA in randomly selected test set and cross-validated it across all blood donors. AIRFIHA outperforms current methods in classification accuracy, especially in B and T lymphocytes, while preserving the natural state of cells. It also shows a promising potential in differentiating CD4 and CD8 cells. Owing to its easy operation, low cost, and strong discerning capability of complex leukocyte subpopulations, we envision AIRFIHA is clinically translatable and can also be deployed in resource-limited settings, e.g., during pandemic situations for the rapid screening of infectious dis
With the growth of global maritime transportation, energy optimization has become crucial for reducing costs and ensuring operational efficiency. Shaft power is the mechanical power transmitted from the engine to the shaft and directly impacts fuel consumption, making its accurate prediction a paramount step in optimizing vessel performance. Power consumption is highly correlated with ship parameters such as speed and shaft rotation per minute, as well as weather and sea conditions. Frequent access to this operational data can improve prediction accuracy. However, obtaining high-quality sensor data is often infeasible and costly, making alternative sources such as noon reports a viable option. In this paper, we propose a transfer learning-based approach for predicting vessels shaft power, where a model is initially trained on high-frequency data from a vessel and then fine-tuned with low-frequency daily noon reports from other vessels. We tested our approach on sister vessels (identical dimensions and configurations), a similar vessel (slightly larger with a different engine), and a different vessel (distinct dimensions and configurations). The experiments showed that the mean abso
We present a new corpus comprising annotations of medical entities in case reports, originating from PubMed Central's open access library. In the case reports, we annotate cases, conditions, findings, factors and negation modifiers. Moreover, where applicable, we annotate relations between these entities. As such, this is the first corpus of this kind made available to the scientific community in English. It enables the initial investigation of automatic information extraction from case reports through tasks like Named Entity Recognition, Relation Extraction and (sentence/paragraph) relevance detection. Additionally, we present four strong baseline systems for the detection of medical entities made available through the annotated dataset.
The ability to estimate and predict pathogen variant dynamics can inform public health responses, including planning for increased transmission or severity, shifts in population immunity, or changes to vaccine or therapeutic effectiveness. The COVID-19 pandemic demonstrated the importance of monitoring SARS-CoV-2 variant evolution through viral genome sequencing, enabling predictive models to estimate variant frequencies in the recent past, present, and short-term future. Collaborative forecasting Hubs provided a valuable way to centralize predictive modeling of epidemiological indicators such as cases, hospitalizations, and deaths during the pandemic; however, none existed for variant dynamics. Here, we discuss the creation of the United States SARS-CoV-2 Variant Nowcast Hub, designed to solicit estimates of the relative abundance of a specified set of SARS-CoV-2 variants at the U.S. state level. We discuss the design decisions and challenges in building the Hub and its scoring procedures. Using submissions from the Hub's first respiratory virus season (nowcast dates October 9th, 2024 to June 4th, 2025), we evaluate five individual models and a baseline model. We found that the ba
Peripheral blood smears remain a cornerstone in the diagnosis of hematological neoplasms, offering rapid and valuable insights that inform subsequent diagnostic steps. However, since neoplastic transformations typically arise in the bone marrow, they may not manifest as detectable aberrations in peripheral blood, presenting a diagnostic challenge. In this paper, we introduce cAItomorph, an explainable transformer-based AI model, trained to classify hematological malignancies based on peripheral blood cytomorphology. Our data comprises peripheral blood single-cell images from 6115 patients with diagnoses confirmed by cytomorphology, cytogenetics, molecular genetics, and immunophenotyping from bone marrow samples, and 495 healthy controls, eight coarse classes. cAItomorph leverages the DinoBloom hematology foundation model and aggregates image encodings via a transformer-based architecture into a single vector. It achieves an overall accuracy of 0.72 in eight disease classification, with F1 scores of 0.76 for acute leukemia, 0.80 for myeloproliferative neoplasms and 0.94 for healthy cases. The overall accuracy increases to 0.87 in top-2 predictions. cAItomorph achieves high sensitivi