Background: A large number of neurology case reports have been published, but it is a challenging task for human medical experts to explore all of these publications. Text mining offers a computational approach to investigate neurology literature and capture meaningful patterns. The overarching goal of this study is to provide a new perspective on case reports of neurological disease and syndrome analysis over the last six decades using text mining. Methods: We extracted diseases and syndromes (DsSs) from more than 65,000 neurology case reports from 66 journals in PubMed over the last six decades from 1955 to 2017. Text mining was applied to reports on the detected DsSs to investigate high-frequency DsSs, categorize them, and explore the linear trends over the 63-year time frame. Results: The text mining methods explored high-frequency neurologic DsSs and their trends and the relationships between them from 1955 to 2017. We detected more than 18,000 unique DsSs and found 10 categories of neurologic DsSs. While the trend analysis showed the increasing trends in the case reports for top-10 high-frequency DsSs, the categories had mixed trends. Conclusion: Our study provided new insigh
Rare diseases, including Inborn Errors of Metabolism (IEM), pose significant diagnostic challenges. Case reports serve as key but computationally underutilized resources to inform diagnosis. Clinical dense information extraction refers to organizing medical information into structured predefined categories. Large Language Models (LLMs) may enable scalable information extraction from case reports but are rarely evaluated for this task. We introduce CaseReportBench, an expert-annotated dataset for dense information extraction of case reports, focusing on IEMs. Using this dataset, we assess various models and prompting strategies, introducing novel approaches such as category-specific prompting and subheading-filtered data integration. Zero-shot chain-of-thought prompting offers little advantage over standard zero-shot prompting. Category-specific prompting improves alignment with the benchmark. The open-source model Qwen2.5-7B outperforms GPT-4o for this task. Our clinician evaluations show that LLMs can extract clinically relevant details from case reports, supporting rare disease diagnosis and management. We also highlight areas for improvement, such as LLMs' limitations in recogni
In 2004, Dai, Lathrop, Lutz, and Mayordomo defined and investigated the finite-state dimension (a finite-state version of algorithmic dimension) of a sequence $S \in Σ^\infty$ and, in 2018, Case and Lutz defined and investigated the mutual (algorithmic) dimension between two sequences $S \in Σ^\infty$ and $T \in Σ^\infty$. In this paper, we propose a definition for the lower and upper finite-state mutual dimensions $mdim_{FS}(S:T)$ and $Mdim_{FS}(S:T)$ between two sequences $S \in Σ^\infty$ and $T \in Σ^\infty$ over an alphabet $Σ$. Intuitively, the finite-state dimension of a sequence $S \in Σ^\infty$ represents the density of finite-state information contained within $S$, while the finite-state mutual dimension between two sequences $S \in Σ^\infty$ and $T \in Σ^\infty$ represents the density of finite-state information shared by $S$ and $T$. Thus ``finite-state mutual dimension'' can be viewed as a ``finite-state'' version of mutual dimension and as a ``mutual'' version of finite-state dimension. The main results of this investigation are as follows. First, we show that finite-state mutual dimension, defined using information-lossless finite-state compressors, has all of the pro
Timely identification of issue reports reflecting software vulnerabilities is crucial, particularly for Internet-of-Things (IoT) where analysis is slower than non-IoT systems. While Machine Learning (ML) and Large Language Models (LLMs) detect vulnerability-indicating issues in non-IoT systems, their IoT use remains unexplored. We are the first to tackle this problem by proposing two approaches: (1) combining ML and LLMs with Natural Language Processing (NLP) techniques to detect vulnerability-indicating issues of 21 Eclipse IoT projects and (2) fine-tuning a pre-trained BERT Masked Language Model (MLM) on 11,000 GitHub issues for classifying \vul. Our best performance belongs to a Support Vector Machine (SVM) trained on BERT NLP features, achieving an Area Under the receiver operator characteristic Curve (AUC) of 0.65. The fine-tuned BERT achieves 0.26 accuracy, emphasizing the importance of exposing all data during training. Our contributions set the stage for accurately detecting IoT vulnerabilities from issue reports, similar to non-IoT systems.
Infectious disease forecasting is of great interest to the public health community and policymakers, since forecasts can provide insight into disease dynamics in the near future and inform interventions. Due to delays in case reporting, however, forecasting models may often underestimate the current and future disease burden. In this paper, we propose a general framework for addressing reporting delay in disease forecasting efforts with the goal of improving forecasts. We propose strategies for leveraging either historical data on case reporting or external internet-based data to estimate the amount of reporting error. We then describe several approaches for adapting general forecasting pipelines to account for under- or over-reporting of cases. We apply these methods to address reporting delay in data on dengue fever cases in Puerto Rico from 1990 to 2009 and to reports of influenza-like illness (ILI) in the United States between 2010 and 2019. Through a simulation study, we compare method performance and evaluate robustness to assumption violations. Our results show that forecasting accuracy and prediction coverage almost always increase when correction methods are implemented to
We report results of the CASE 2022 Shared Task 1 on Multilingual Protest Event Detection. This task is a continuation of CASE 2021 that consists of four subtasks that are i) document classification, ii) sentence classification, iii) event sentence coreference identification, and iv) event extraction. The CASE 2022 extension consists of expanding the test data with more data in previously available languages, namely, English, Hindi, Portuguese, and Spanish, and adding new test data in Mandarin, Turkish, and Urdu for Sub-task 1, document classification. The training data from CASE 2021 in English, Portuguese and Spanish were utilized. Therefore, predicting document labels in Hindi, Mandarin, Turkish, and Urdu occurs in a zero-shot setting. The CASE 2022 workshop accepts reports on systems developed for predicting test data of CASE 2021 as well. We observe that the best systems submitted by CASE 2022 participants achieve between 79.71 and 84.06 F1-macro for new languages in a zero-shot setting. The winning approaches are mainly ensembling models and merging data in multiple languages. The best two submissions on CASE 2021 data outperform submissions from last year for Subtask 1 and Su
Robotic technology has the potential to revolutionize the field of neurology by providing new methods for diagnosis, treatment, and rehabilitation of neurological disorders. In recent years, there has been an increasing interest in the development of robotics applications for neurology, driven by advances in sensing, actuation, and control systems. This review paper provides a comprehensive overview of the recent advancements in robotics technology for neurology, with a focus on three main areas: diagnosis, treatment, and rehabilitation. In the area of diagnosis, robotics has been used for developing new imaging techniques and tools for more accurate and non-invasive mapping of brain structures and functions. For treatment, robotics has been used for developing minimally invasive surgical procedures, including stereotactic and endoscopic approaches, as well as for the delivery of therapeutic agents to specific targets in the brain. In rehabilitation, robotics has been used for developing assistive devices and platforms for motor and cognitive training of patients with neurological disorders. The paper also discusses the challenges and limitations of current robotics technology for
Analyzing large volumes of case law to uncover evolving legal principles, across multiple cases, on a given topic is a demanding task for legal professionals. Structured topical reports provide an effective solution by summarizing key issues, principles, and judgments, enabling comprehensive legal analysis on a particular topic. While prior works have advanced query-based individual case summarization, none have extended to automatically generating multi-case structured reports. To address this, we introduce LexGenie, an automated LLM-based pipeline designed to create structured reports using the entire body of case law on user-specified topics within the European Court of Human Rights jurisdiction. LexGenie retrieves, clusters, and organizes relevant passages by topic to generate a structured outline and cohesive content for each section. Expert evaluation confirms LexGenie's utility in producing structured reports that enhance efficient, scalable legal analysis.
A set $S\subseteq V$ is a dominating set of $G$ if every vertex in $V - S$ is adjacent to at least one vertex in $S$. The domination number $γ(G)$ of $G$ equals the minimum cardinality of a dominating set $S$ in $G$; we say that such a set $S$ is a $γ$-set. A generalization of this is partial domination which was introduced in 2017 by Case, Hedetniemi, Laskar, and Lipman [3,2] . In partial domination a set $S$ is a $p$-dominating set if it dominates a proportion $p$ of the vertices in $V$. The p-domination number $γ_{p}(G)$ is the minimum cardinality of a $p$-dominating set in $G$. In this paper, we investigate further properties of partial dominating sets, particularly ones related to graph products and locating partial dominating sets. We also introduce the concept of a $p$-influencing set as the union of all $p$-dominating sets for a fixed $p$ and investigate some of its properties.
Purpose: We investigated the utilization of privacy-preserving, locally-deployed, open-source Large Language Models (LLMs) to extract diagnostic information from free-text cardiovascular magnetic resonance (CMR) reports. Materials and Methods: We evaluated nine open-source LLMs on their ability to identify diagnoses and classify patients into various cardiac diagnostic categories based on descriptive findings in 109 clinical CMR reports. Performance was quantified using standard classification metrics including accuracy, precision, recall, and F1 score. We also employed confusion matrices to examine patterns of misclassification across models. Results: Most open-source LLMs demonstrated exceptional performance in classifying reports into different diagnostic categories. Google's Gemma2 model achieved the highest average F1 score of 0.98, followed by Qwen2.5:32B and DeepseekR1-32B with F1 scores of 0.96 and 0.95, respectively. All other evaluated models attained average scores above 0.93, with Mistral and DeepseekR1-7B being the only exceptions. The top four LLMs outperformed our board-certified cardiologist (F1 score of 0.94) across all evaluation metrics in analyzing CMR reports.
Medical imaging technologies have undergone extensive development, enabling non-invasive visualization of clinical information. The traditional review of medical images by clinicians remains subjective, time-consuming, and prone to human error. With the recent availability of medical imaging data, quantification have become important goals in the field. Radiomics, a methodology aimed at extracting quantitative information from imaging data, has emerged as a promising approach to uncover hidden biological information and support decision-making in clinical practice. This paper presents a review of the radiomic pipeline from the clinical neuroimaging perspective, providing a detailed overview of each step with practical advice. It discusses the application of handcrafted and deep radiomics in neuroimaging, stratified by neurological diagnosis. Although radiomics shows great potential for increasing diagnostic precision and improving treatment quality in neurology, several limitations hinder its clinical implementation. Addressing these challenges requires collaborative efforts, advancements in image harmonization methods, and the establishment of reproducible and standardized pipelin
In the face of an infectious disease, a key epidemiological measure is the basic reproduction number, which quantifies the average secondary infections caused by a single case in a susceptible population. In practice, the effective reproduction number, denoted as $R_t$, is widely used to assess the transmissibility of the disease at a given time $t$. Real-time estimating this metric is vital for understanding and managing disease outbreaks. Traditional statistical inference often relies on two assumptions. One is that samples are assumed to be drawn from a homogeneous population distribution, neglecting significant variations in individual transmission rates. The other is the ideal case reporting assumption, disregarding time delays between infection and reporting. In this paper, we thoroughly investigate these critical factors and assess their impact on estimating $R_t$. We first introduce negative binomial and Weibull distributions to characterize transmission rates and reporting delays, respectively, based on which observation and state equations are formulated. Then, we employ a Bayesian filtering for estimating $R_t$. Finally, validation using synthetic and empirical data demo
With the growth of global maritime transportation, energy optimization has become crucial for reducing costs and ensuring operational efficiency. Shaft power is the mechanical power transmitted from the engine to the shaft and directly impacts fuel consumption, making its accurate prediction a paramount step in optimizing vessel performance. Power consumption is highly correlated with ship parameters such as speed and shaft rotation per minute, as well as weather and sea conditions. Frequent access to this operational data can improve prediction accuracy. However, obtaining high-quality sensor data is often infeasible and costly, making alternative sources such as noon reports a viable option. In this paper, we propose a transfer learning-based approach for predicting vessels shaft power, where a model is initially trained on high-frequency data from a vessel and then fine-tuned with low-frequency daily noon reports from other vessels. We tested our approach on sister vessels (identical dimensions and configurations), a similar vessel (slightly larger with a different engine), and a different vessel (distinct dimensions and configurations). The experiments showed that the mean abso
We present a new corpus comprising annotations of medical entities in case reports, originating from PubMed Central's open access library. In the case reports, we annotate cases, conditions, findings, factors and negation modifiers. Moreover, where applicable, we annotate relations between these entities. As such, this is the first corpus of this kind made available to the scientific community in English. It enables the initial investigation of automatic information extraction from case reports through tasks like Named Entity Recognition, Relation Extraction and (sentence/paragraph) relevance detection. Additionally, we present four strong baseline systems for the detection of medical entities made available through the annotated dataset.
More and more scientific research shows that there is a close correlation between the Internet and brain science. This paper presents the idea of establishing the Internet neurology, which means to make a cross-contrast between the two in terms of physiology and psychology, so that a complete infrastructure system of the Internet is established, predicting the development trend of the Internet in the future as well as the brain structure and operation mechanism, and providing theoretical support for the generation principle of intelligence, cognition and emotion. It also proposes the viewpoint that the Internet can be divided into Internet neurophysiology, Internet neuropsychology, Brain Internet physiology, Brain Internet psychology and the Internet in cognitive science.
The ability to estimate and predict pathogen variant dynamics can inform public health responses, including planning for increased transmission or severity, shifts in population immunity, or changes to vaccine or therapeutic effectiveness. The COVID-19 pandemic demonstrated the importance of monitoring SARS-CoV-2 variant evolution through viral genome sequencing, enabling predictive models to estimate variant frequencies in the recent past, present, and short-term future. Collaborative forecasting Hubs provided a valuable way to centralize predictive modeling of epidemiological indicators such as cases, hospitalizations, and deaths during the pandemic; however, none existed for variant dynamics. Here, we discuss the creation of the United States SARS-CoV-2 Variant Nowcast Hub, designed to solicit estimates of the relative abundance of a specified set of SARS-CoV-2 variants at the U.S. state level. We discuss the design decisions and challenges in building the Hub and its scoring procedures. Using submissions from the Hub's first respiratory virus season (nowcast dates October 9th, 2024 to June 4th, 2025), we evaluate five individual models and a baseline model. We found that the ba
This paper investigates the risk-return relationship in determination of housing asset pricing. In so doing, the paper evaluates behavioral hypotheses advanced by Case and Shiller (1988, 2002, 2009) in studies of boom and post-boom housing markets. The paper specifies and tests a multi-factor housing asset pricing model. In that model, we evaluate whether the market factor as well as other measures of risk, including idiosyncratic risk, momentum, and MSA size effects, have explanatory power for metropolitan-specific housing returns. Further, we test the robustness of the asset pricing results to inclusion of controls for socioeconomic variables commonly represented in the house price literature, including changes in employment, affordability, and foreclosure incidence. We find a sizable and statistically significant influence of the market factor on MSA house price returns. Moreover we show that market betas have varied substantially over time. Also, results are largely robust to the inclusion of other explanatory variables, including standard measures of risk and other housing market fundamentals. Additional tests of model validity using the Fama-MacBeth framework offer further st
Following the first two annual intensity mapping workshops at Stanford in March 2016 and Johns Hopkins in June 2017, we report on the recent advances in theory, instrumentation and observation that were presented in these meetings and some of the opportunities and challenges that were identified looking forward. With preliminary detections of CO, [CII], Lya and low-redshift 21cm, and a host of experiments set to go online in the next few years, the field is rapidly progressing on all fronts, with great anticipation for a flood of new exciting results. This current snapshot provides an efficient reference for experts in related fields and a useful resource for nonspecialists. We begin by introducing the concept of line-intensity mapping and then discuss the broad array of science goals that will be enabled, ranging from the history of star formation, reionization and galaxy evolution to measuring baryon acoustic oscillations at high redshift and constraining theories of dark matter, modified gravity and dark energy. After reviewing the first detections reported to date, we survey the experimental landscape, presenting the parameters and capabilities of relevant instruments such as C
The portrayal of crowd accidents by the media can influence public understanding and emotional response, shaping societal perceptions and potentially impacting safety measures and preparedness strategies. This paper critically examines the portrayal of crowd accidents in news coverage by analyzing the texts of 372 media reports of crowd accidents spanning 26 diverse news sources from 1900 to 2019. We investigate how media representations of crowd accidents vary across time and geographical origins. Our methodology combines lexical analysis to unveil prevailing terminologies and sentiment analysis to discern the emotional tenor of the reports. The findings reveal the prevalence of the term "stampede" over "panic" in media descriptions of crowd accidents. Notably, divergent patterns are observable when comparing Western versus South Asian media (notably India and Pakistan), unveiling a cross-cultural dimension. Moreover, the analysis detects a gradual transition from "crowd stampede" to "crowd crush" in media and Wikipedia narratives in recent years, suggesting evolving lexical sensitivities. Sentiment analysis uncovers a consistent association with fear-related language, indicative
In medical imaging, access to data is commonly limited due to patient privacy restrictions and the issue that it can be difficult to acquire enough data in the case of rare diseases.[1] The purpose of this investigation was to develop a reusable open-source synthetic image generation pipeline, the GAN Image Synthesis Tool (GIST), that is easy to use as well as easy to deploy. The pipeline helps to improve and standardize AI algorithms in the digital health space by generating high quality synthetic image data that is not linked to specific patients. Its image generation capabilities include the ability to generate imaging of pathologies or injuries with low incidence rates. This improvement of digital health AI algorithms could improve diagnostic accuracy, aid in patient care, decrease medicolegal claims, and ultimately decrease the overall cost of healthcare. The pipeline builds on existing Generative Adversarial Networks (GANs) algorithms, and preprocessing and evaluation steps were included for completeness. For this work, we focused on ensuring the pipeline supports radiography, with a focus on synthetic knee and elbow x-ray images. In designing the pipeline, we evaluated the p