During infectious disease epidemics, pathogen transmission occurs in host populations made up of interacting subpopulations. Using stochastic simulation and analytical approximations, we examine how outbreak sizes in networked populations depend on network architecture, subpopulation sizes and the strength of coupling between subpopulations. We find, as expected, that mean outbreak sizes are frequently lower in networked populations than in homogeneous populations with the same basic reproduction number. However, after an outbreak ends, a networked population is often vulnerable to further outbreaks, and the ending of an outbreak may not imply herd immunity in any sense. Another key finding is that a relatively small amount of randomly distributed prior immunity can be more protective in a networked population than a homogeneous population, a phenomenon which can be reproduced analytically in certain cases. We also find that in networked populations, randomly distributed prior immunity is often more protective than infection-acquired immunity; but this conclusion can be reversed in populations with highly variable susceptibility. All of these conclusions have implications for desig
Forecasting the occurrence and absence of novel disease outbreaks is essential for disease management. Here, we develop a general model, with no real-world training data, that accurately forecasts outbreaks and non-outbreaks. We propose a novel framework, using a feature-based time series classification method to forecast outbreaks and non-outbreaks. We tested our methods on synthetic data from a Susceptible-Infected-Recovered model for slowly changing, noisy disease dynamics. Outbreak sequences give a transcritical bifurcation within a specified future time window, whereas non-outbreak (null bifurcation) sequences do not. We identified incipient differences in time series of infectives leading to future outbreaks and non-outbreaks. These differences are reflected in 22 statistical features and 5 early warning signal indicators. Classifier performance, given by the area under the receiver-operating curve, ranged from 0.99 for large expanding windows of training data to 0.7 for small rolling windows. Real-world performances of classifiers were tested on two empirical datasets, COVID-19 data from Singapore and SARS data from Hong Kong, with two classifiers exhibiting high accuracy. I
Rapid identification of outbreaks in hospitals is essential for controlling pathogens with epidemic potential. Although whole genome sequencing (WGS) remains the gold standard in outbreak investigations, its substantial costs and turnaround times limit its feasibility for routine surveillance, especially in less-equipped facilities. We explore three modalities as rapid alternatives: matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass spectrometry, antimicrobial resistance (AR) patterns, and electronic health records (EHR). We present a machine learning approach that learns discriminative features from these modalities to support outbreak detection. Multi-species evaluation shows that the integration of these modalities can boost outbreak detection performance. We also propose a tiered surveillance paradigm that can reduce the need for WGS through these alternative modalities. Further analysis of EHR information identifies potentially high-risk contamination routes linked to specific clinical procedures, notably those involving invasive equipment and high-frequency workflows, providing infection prevention teams with actionable targets for proactive risk miti
Epidemic forecasting has become an integral part of real-time infectious disease outbreak response. While collaborative ensembles composed of statistical and machine learning models have become the norm for real-time forecasting, standardized benchmark datasets for evaluating such methods are lacking. Further, there is limited understanding on performance of these methods for novel outbreaks with limited historical data. In this paper, we propose IDOBE, a curated collection of epidemiological time series focused on outbreak forecasting. IDOBE compiles from multiple data repositories spanning over a century of surveillance and across U.S. states and global locations. We perform derivative-based segmentation to generate over 10,000 outbreaks covering multiple outcomes such as cases and hospitalizations for 13 diseases. We consider a variety of information-theoretic and distributional measures to quantify the epidemiological diversity of the dataset. Finally, we perform multi-horizon short-term forecasting (1- to 4-week-ahead) through the progression of the outbreak using 11 baseline models and report on their performance. In addition to standard metrics such as NMSE and MAPE for poin
Avian Influenza Virus (AIV) poses significant threats to the poultry industry, humans, domestic animals, and wildlife health worldwide. Monitoring this infectious disease is important for rapid and effective response to potential outbreaks. Conventional avian influenza surveillance systems have exhibited limitations in providing timely alerts for potential outbreaks. This study aimed to examine the idea of using online activity on social media, and Google searches to improve the identification of AIV in the early stage of an outbreak in a region. To this end, to evaluate the feasibility of this approach, we collected historical data on online user activities from X (formerly known as Twitter) and Google Trends and assessed the statistical correlation of activities in a region with the AIV outbreak officially reported case numbers. In order to mitigate the effect of the noisy content on the outbreak identification process, large language models were utilized to filter out the relevant online activity on X that could be indicative of an outbreak. Additionally, we conducted trend analysis on the selected internet-based data sources in terms of their timeliness and statistical signific
Accurate and timely identification of hospital outbreak clusters is crucial for preventing the spread of infections that have epidemic potential. While assessing pathogen similarity through whole genome sequencing (WGS) is considered the gold standard for outbreak detection, its high cost and lengthy turnaround time preclude routine implementation in clinical laboratories. We explore the utility of two rapid and cost-effective alternatives to WGS, matrix-assisted laser desorption ionization-time of flight (MALDI-TOF) mass spectrometry and antimicrobial resistance (AR) patterns. We develop a machine learning framework that extracts informative representations from MALDI-TOF spectra and AR patterns for outbreak detection and explore their fusion. Through multi-species analyses, we demonstrate that in some cases MALDI-TOF and AR have the potential to reduce reliance on WGS, enabling more accessible and rapid outbreak surveillance.
The emergence of a hantavirus variant aboard a commercial cruise ship presents a significant public health concern. This study develops a discrete-time stochastic Susceptible-Exposed-Infectious-Recovered-Dead model to estimate transmission dynamics, hidden exposed infections, and outbreak risk among passengers and crew. Epidemiological parameters and latent disease states were inferred using an Ensemble Adjustment Kalman Filter calibrated to reported case data from WHO and ECDC situation reports. The estimated basic reproduction number was 2.76, with a 95\% confidence interval of 2.52-2.99, indicating substantial potential for sustained onboard transmission before strict quarantine measures. Simulations further suggest that several exposed individuals may remain unidentified during the early outbreak phase, creating a hidden reservoir that symptom-based surveillance alone may fail to detect. These findings highlight the importance of rapid surveillance, widespread testing, targeted quarantine, and active monitoring of exposed individuals in confined travel settings. The proposed modeling framework can support timely outbreak assessment and intervention planning for infectious-disea
Recurrent COVID-19 outbreaks have placed immense strain on the hospital system in Quebec. We develop a Bayesian three-state coupled Markov switching model to analyze COVID-19 outbreaks across Quebec based on admissions in the 30 largest hospitals. Within each catchment area, we assume the existence of three states for the disease: absence, a new state meant to account for many zeroes in some of the smaller areas, endemic and outbreak. Then we assume the disease switches between the three states in each area through a series of coupled nonhomogeneous hidden Markov chains. Unlike previous approaches, the transition probabilities may depend on covariates and the occurrence of outbreaks in neighboring areas, to account for geographical outbreak spread. Additionally, to prevent rapid switching between endemic and outbreak periods we introduce clone states into the model which enforce minimum endemic and outbreak durations. We make some interesting findings, such as that mobility in retail and recreation venues had a positive association with the development and persistence of new COVID-19 outbreaks in Quebec. Based on model comparison our contributions show promise in improving state es
In this work, an individual-based model of forest insect outbreaks is presented. The results obtained show that the outbreak is an emerging feature of the system. It is a common product of the characteristics of insects, the environment in which the insects live, and the way insects behave in it. The outbreak dynamics is an effect of scale. In a sufficiently large forest regardless of the density of trees and their spatial distribution, provided that the range of insect dispersion is large enough, it develops in the form of an outbreak. In very small forests, the dynamics becomes more chaotic. It loses the outbreak character and, especially in the forest with random tree distribution, there is a possibility that the insect population goes extinct. The local dynamics of the number of insects on one tree in a forest, where the dynamics of all insects has the character of outbreak, is characterized by a rapid increase in number and then a rapid decrease until the extinction of the local population. It is the result of the influx of immigrants from neighboring trees. The type of tree distribution in the forest becomes visible when the density of trees becomes low and/or the range of in
On May 15, 2025, Brazil reported its first highly pathogenic avian influenza (HPAI) outbreak in a commercial poultry breeder farm in Montenegro, Rio Grande do Sul. This study presents the outbreak timeline, control measures, along with spatial risk assessment and epidemiological model used to simulate detection delays. The transmission model considered Susceptible Exposed Infected Recovered Dead farm statuses to simulate within farm and between farm dynamics under 3 day, 5 day, and 10 day detection delays. The single infected commercial farm lost 15,650 birds, with 92% mortality due to HPAI, and additional culling of the remaining birds on Day 5 post-notification to the state animal health officials. Based on the mortality and outbreak response data, the introduction likely occurred 3 10 days before its official detection. Our field investigations suggested that wild birds were the most likely source of introduction, although biosecurity breaches could not be ruled out. Control measures implemented included movement restrictions and a control zone, from which 4,197 vehicles were inspected upon entry. Risk analysis classified 64.4% of municipalities as low risk, 35.0% as medium risk
A measles outbreak occurs when the number of cases of measles in the population exceeds the typical level. Outbreaks that are not detected and managed early can increase mortality and morbidity and incur costs from activities responding to these events. The number of measles cases in the Province of North Cotabato, Philippines, was used in this study. Weekly reported cases of measles from January 2016 to December 2021 were provided by the Epidemiology and Surveillance Unit of the North Cotabato Provincial Health Office. Several integer-valued autoregressive (INAR) time series models were used to explore the possibility of detecting and identifying measles outbreaks in the province along with the classical ARIMA model. These models were evaluated based on goodness of fit, measles outbreak detection accuracy, and timeliness. The results of this study confirmed that INAR models have the conceptual advantage over ARIMA since the latter produces non-integer forecasts, which are not realistic for count data such as measles cases. Among the INAR models, the ZINGINAR (1) model was recommended for having a good model fit and timely and accurate detection of outbreaks. Furthermore, policymak
Towards the end of an infectious disease outbreak, when a period has elapsed without new case notifications, a key question for public health policy makers is whether the outbreak can be declared over. This requires the benefits of a declaration (e.g., relaxation of outbreak control measures) to be balanced against the risk of a resurgence in cases. To support this decision making, mathematical methods have been developed to quantify the end-of-outbreak probability. Here, we propose a new approach to this problem that accounts for a range of features of real-world outbreaks, specifically: (i) incomplete case ascertainment; (ii) reporting delays; (iii) individual heterogeneity in transmissibility; and (iv) whether cases were imported or infected locally. We showcase our approach using two case studies: Covid-19 in New Zealand in 2020, and Ebola virus disease in the Democratic Republic of the Congo in 2018. In these examples, we found that the date when the estimated probability of no future infections reached 95% was relatively consistent across a range of modelling assumptions. This suggests that our modelling framework can generate robust quantitative estimates that can be used by
Mathematical models are established tools to assist in outbreak response. They help characterise complex patterns in disease spread, simulate control options to assist public health authorities in decision-making, and longer-term operational and financial planning. In the context of vaccine-preventable diseases (VPDs), vaccines are one of the most-cost effective outbreak response interventions, with the potential to avert significant morbidity and mortality through timely delivery. Models can contribute to the design of vaccine response by investigating the importance of timeliness, identifying high-risk areas, prioritising the use of limited vaccine supply, highlighting surveillance gaps and reporting, and determining the short- and long-term benefits. In this review, we examine how models have been used to inform vaccine response for 10 VPDs, and provide additional insights into the challenges of outbreak response modelling, such as data gaps, key vaccine-specific considerations, and communication between modellers and stakeholders. We illustrate that while models are key to policy-oriented outbreak vaccine response, they can only be as good as the surveillance data that inform t
During the SARS-CoV-2 pandemic, polymerase chain reaction (PCR) and lateral flow device (LFD) tests were frequently deployed to detect the presence of SARS-CoV-2. Many of these tests were singleplex, and only tested for the presence of a single pathogen. Multiplex tests can test for the presence of several pathogens using only a single swab, which can allow for: surveillance of more pathogens, targeting of antiviral interventions, a reduced burden of testing, and lower costs. Test sensitivity however, particularly in LFD tests, is highly conditional on the viral concentration dynamics of individuals. To inform the use of multiplex testing in outbreak detection it is therefore necessary to investigate the interactions between outbreak detection strategies and the differing viral concentration trajectories of key pathogens. Viral concentration trajectories are estimated for SARS-CoV-2, and Influenza A/B. Testing strategies for the first five symptomatic cases in an outbreak are then simulated and used to evaluate key performance indicators. Strategies that use a combination of multiplex LFD and PCR tests achieve; high levels of detection, detect outbreaks rapidly, and have the lowest
We first propose a quantitative approach to detect high risk outbreaks of independent and coinfective SIR dynamics on three empirical networks: a school, a conference and a hospital contact network. This measurement is based on the k-means clustering method and identifies proper samples for calculating the mean outbreak size and the outbreak probability. Then we systematically study the impact of different temporal correlations on high risk outbreaks over the original and differently shuffled counterparts of each network. We observe that, on the one hand, in the coinfection process, randomization of the sequence of the events increases the mean outbreak size of high risk cases. On the other hand, these correlations don't have a consistent effect on the independent infection dynamics, and can either decrease or increase this mean. While randomization of the daily pattern correlations has no significant effect on the size of outbreak in either of the coinfection or independent spreading cases. We also observer that an increase in the mean outbreak size doesn't always coincide with an increase in the outbreak probability; therefore we argue that merely considering the mean outbreak si
In today's world,the risk of emerging and re-emerging epidemics have increased.The recent advancement in healthcare technology has made it possible to predict an epidemic outbreak in a region.Early prediction of an epidemic outbreak greatly helps the authorities to be prepared with the necessary medications and logistics required to keep things in control. In this article, we try to predict the epidemic outbreak (influenza, hepatitis and malaria) for the state of New York, USA using machine and deep learning algorithms, and a portal has been created for the same which can alert the authorities and health care organizations of the region in case of an outbreak. The algorithm takes historical data to predict the possible number of cases for 5 weeks into the future. Non-clinical factors like google search trends,social media data and weather data have also been used to predict the probability of an outbreak.
Recent outbreaks of monkeypox and Ebola, and worrying waves of COVID-19, influenza and respiratory syncytial virus, have all led to a sharp increase in the use of epidemiological models to estimate key epidemiological parameters. The feasibility of this estimation task is known as the practical identifiability (PI) problem. Here, we investigate the PI of eight commonly reported statistics of the classic Susceptible-Infectious-Recovered model using a new measure that shows how much a researcher can expect to learn in a model-based Bayesian analysis of prevalence data. Our findings show that the basic reproductive number and final outbreak size are often poorly identified, with learning exceeding that of individual model parameters only in the early stages of an outbreak. The peak intensity, peak timing, and initial growth rate are better identified, being in expectation over 20 times more probable having seen the data by the time the underlying outbreak peaks. We then test PI for a variety of true parameter combinations, and find that PI is especially problematic in slow-growing or less-severe outbreaks. These results add to the growing body of literature questioning the reliability
Epidemic spreading has been studied for a long time and most of them are focused on the growing aspect of a single epidemic outbreak. Recently, we extended the study to the case of recurrent epidemics (Sci. Rep. {\bf 5}, 16010 (2015)) but limited only to a single network. We here report from the real data of coupled regions or cities that the recurrent epidemics in two coupled networks are closely related to each other and can show either synchronized outbreak phase where outbreaks occur simultaneously in both networks or mixed outbreak phase where outbreaks occur in one network but do not in another one. To reveal the underlying mechanism, we present a two-layered network model of coupled recurrent epidemics to reproduce the synchronized and mixed outbreak phases. We show that the synchronized outbreak phase is preferred to be triggered in two coupled networks with the same average degree while the mixed outbreak phase is preferred for the case with different average degrees. Further, we show that the coupling between the two layers is preferred to suppress the mixed outbreak phase but enhance the synchronized outbreak phase. A theoretical analysis based on microscopic Markov-chai
In May 2022, a cluster of mpox cases were detected in the UK that could not be traced to recent travel history from an endemic region. Over the coming months, the outbreak grew, with over 3000 total cases reported in the UK, and similar outbreaks occurring worldwide. These outbreaks appeared linked to sexual contact networks between gay, bisexual and other men who have sex with men. Following the COVID-19 pandemic, local health systems were strained, and therefore effective surveillance for mpox was essential for managing public health policy. However, the mpox outbreak in the UK was characterised by substantial delays in the reporting of the symptom onset date and specimen collection date for confirmed positive cases. These delays led to substantial backfilling in the epidemic curve, making it challenging to interpret the epidemic trajectory in real-time. Many nowcasting models exist to tackle this challenge in epidemiological data, but these lacked sufficient flexibility. We have developed a novel nowcasting model using generalised additive models to correct the mpox epidemic curve in England, and provide real-time characteristics of the state of the epidemic, including the real-
We study the effect of noisy infection (contact) and recovery rates on the distribution of outbreak sizes in the stochastic SIR model. The rates are modeled as Ornstein-Uhlenbeck processes with finite correlation time and variance, which we illustrate using outbreak data from the RSV 2019-2020 season in the US. In the limit of large populations, we find analytical solutions for the outbreak-size distribution in the long-correlated (adiabatic) and short-correlated (white) noise regimes, and demonstrate that the distribution can be highly skewed with significant probabilities for large fluctuations away from mean-field theory. Furthermore, we assess the relative contribution of demographic and reaction-rate noise on the outbreak-size variance, and show that demographic noise becomes irrelevant in the presence of slowly varying reaction-rate noise but persists for large system sizes if the noise is fast. Finally, we show that the crossover to the white-noise regime typically occurs for correlation times that are on the same order as the characteristic recovery time in the model.