Understanding vaccine effects on post-infection outcomes is critical for evaluating the full value proposition of a vaccine. However, defining appropriate causal effects on such outcomes is challenging because infection is affected by vaccination. Existing principal stratification approaches focus on the \emph{Doomed} stratum, individuals who would be infected regardless of vaccine receipt. For many relevant outcomes, however, this estimand will understate vaccine benefit by excluding individuals whose adverse post-infection outcomes are improved because vaccination prevented infection. We therefore propose causal estimands for post-infection outcomes in the \emph{Naturally Infected}, individuals who would be infected in absence of vaccine. We derive bounds under minimal assumptions and give point identification results under an exclusion restriction and/or a partial principal ignorability assumption. For point-identified settings, we develop efficient one-step estimators with robustness properties under inconsistent nuisance parameter estimation. We further show under what conditions the same identification functional can be interpreted as targeting an effect among individuals exp
In general, the rates of infection and removal (whether through recovery or death) are nonlinear functions of the number of infected and susceptible individuals. One of the simplest models for the spread of infectious diseases is the SIR model, which categorizes individuals as susceptible, infectious, recovered or deceased. In this model, the infection rate, governing the transition from susceptible to infected individuals, is given by a linear function of both susceptible and infected populations. Similarly, the removal rate, representing the transition from infected to removed individuals, is a linear function of the number of infected individuals. While nonlinear infection and removal rates have been extensively studied in deterministic epidemiological models, analytic results for stochastic dynamics with general nonlinear rates remain limited. This work presents an analytic expression for the number of infected individuals considering nonlinear infection and removal rates. In particular, we examine how the number of infected individuals varies as cases emerge and obtain the expression accounting for the number of infected individuals at each moment. This work paves the way for
Finding the infection sources in a network when we only know the network topology and infected nodes, but not the rates of infection, is a challenging combinatorial problem, and it is even more difficult in practice where the underlying infection spreading model is usually unknown a priori. In this paper, we are interested in finding a source estimator that is applicable to various spreading models, including the Susceptible-Infected (SI), Susceptible-Infected-Recovered (SIR), Susceptible-Infected-Recovered-Infected (SIRI), and Susceptible-Infected-Susceptible (SIS) models. We show that under the SI, SIR and SIRI spreading models and with mild technical assumptions, the Jordan center is the infection source associated with the most likely infection path in a tree network with a single infection source. This conclusion applies for a wide range of spreading parameters, while it holds for regular trees under the SIS model with homogeneous infection and recovery rates. Since the Jordan center does not depend on the infection, recovery and reinfection rates, it can be regarded as a universal source estimator. We also consider the case where there are k>1 infection sources, generalize
Hospital acquired infections are a serious threat to the health and well-being of patients and medical staff within clinical units. Many of these infections arise as a consequence of medical personnel that come into contact with contaminated persons, surfaces or equipment and then with patients, without following proper hygiene procedures. In this paper we present our ongoing efforts in the development of a wireless sensor network based cyber-physical system which aims to prevent hospital infections by increasing compliance to established hygiene guidelines. The solution, currently developed under European Union funding integrates a network of sensors for monitoring clinical workflows and ambient conditions, a workflow engine that executes encoded workflow instances and monitoring software that provides real-time information in case of infection risk detection. As a motivating example, we employ the workflow in the general practitioner's office in order to comprehensively present types of sensors and their positioning in the monitored location. Using the information collected by deployed sensors, the system is capable of immediately detecting infection risks and taking action to pr
Analyzing the SARS-CoV-2 pandemic outbreak based on actual data while reflecting the characteristics of the real city provides beneficial information for taking reasonable infection control measures in the future. We demonstrate agent-based modeling for Tokyo based on GPS information and official national statistics and perform a spatiotemporal analysis of the infection situation in Tokyo. As a result of the simulation during the first wave of SARS-CoV-2 in Tokyo using real GPS data, the infection occurred in the service industry, such as restaurants, in the city center, and then the infected people brought back the virus to the residential area; the infection spread in each area in Tokyo. This phenomenon clarifies that the spread of infection can be curbed by suppressing going out or strengthening infection prevention measures in service facilities. It was shown that pandemic measures in Tokyo could be achieved not only by strong control, such as the lockdown of cities, but also by thorough infection prevention measures in service facilities, which explains the curb phenomena in real Tokyo.
Identifying the infection sources in a network, including the index cases that introduce a contagious disease into a population network, the servers that inject a computer virus into a computer network, or the individuals who started a rumor in a social network, plays a critical role in limiting the damage caused by the infection through timely quarantine of the sources. We consider the problem of estimating the infection sources and the infection regions (subsets of nodes infected by each source) in a network, based only on knowledge of which nodes are infected and their connections, and when the number of sources is unknown a priori. We derive estimators for the infection sources and their infection regions based on approximations of the infection sequences count. We prove that if there are at most two infection sources in a geometric tree, our estimator identifies the true source or sources with probability going to one as the number of infected nodes increases. When there are more than two infection sources, and when the maximum possible number of infection sources is known, we propose an algorithm with quadratic complexity to estimate the actual number and identities of the in
We study an individual-based stochastic spatial epidemic model where the number of locations and the number of individuals at each location both grow to infinity. Each individual is associated with a random infection-age dependent infectivity function. Individuals are infected through interactions across the locations with heterogeneous effects. The epidemic dynamics can be described using a time-space representation for the the total force of infection, the number of susceptible individuals, the number of infected individuals that are infected at each time and have been infected for a certain amount of time, as well as the number of recovered individuals. We prove a functional law of large numbers for these time-space processes, and in the limit, we obtain a set of time-space integral equations. We then derive the PDE models from the limiting time-space integral equations, in particular, the density (with respect to the infection age) of the time-age-space integral equation for the number of infected individuals tracking the age of infection satisfies a linear PDE in time and age with an integral boundary condition. These integral equation and PDE limits can be regarded as dynamic
Carbapenemase-Producing Enterobacteriace poses a critical concern for infection prevention and control in hospitals. However, predictive modeling of previously highlighted CPE-associated risks such as readmission, mortality, and extended length of stay (LOS) remains underexplored, particularly with modern deep learning approaches. This study introduces an eXplainable AI modeling framework to investigate CPE impact on patient outcomes from Electronic Medical Records data of an Irish hospital. We analyzed an inpatient dataset from an Irish acute hospital, incorporating diagnostic codes, ward transitions, patient demographics, infection-related variables and contact network features. Several Transformer-based architectures were benchmarked alongside traditional machine learning models. Clinical outcomes were predicted, and XAI techniques were applied to interpret model decisions. Our framework successfully demonstrated the utility of Transformer-based models, with TabTransformer consistently outperforming baselines across multiple clinical prediction tasks, especially for CPE acquisition (AUROC and sensitivity). We found infection-related features, including historical hospital exposu
HIV transmission within serodiscordant couples remains a significant public health challenge, particularly in sub-Saharan Africa. Estimating the rate of such infection, alongside the rates of introduction of infection from outside the partnership, is a special case of the more general epidemiological challenge of inferring intensities of within- and between-group intensities of transmission. This study presents a stochastic susceptible-infected (SI) pair model for estimating key epidemiological parameters governing HIV transmission within and between couples, which we further extend to account for gender-specific differences in infection dynamics. Using a likelihood-based inference approach, we estimate transmission parameters and associated uncertainty from observed data. These values can be used to inform infection prevention strategies for HIV, and the methodology proposed can be generalised to other epidemiological settings.
Due to the steady rise in population demographics and longevity, emergency department visits are increasing across North America. As more patients visit the emergency department, traditional clinical workflows become overloaded and inefficient, leading to prolonged wait-times and reduced healthcare quality. One of such workflows is the triage medical directive, impeded by limited human workload, inaccurate diagnoses and invasive over-testing. To address this issue, we propose TriNet: a machine learning model for medical directives that automates first-line screening at triage for conditions requiring downstream testing for diagnosis confirmation. To verify screening potential, TriNet was trained on hospital triage data and achieved high positive predictive values in detecting pneumonia (0.86) and urinary tract infection (0.93). These models outperform current clinical benchmarks, indicating that machine-learning medical directives can offer cost-free, non-invasive screening with high specificity for common conditions, reducing the risk of over-testing while increasing emergency department efficiency.
Two factors that are often ignored but could play a crucial role in the progression of an infectious disease are the distributions of inherent susceptibility ($σ_{inh}$) and external infectivity ($ι_{ext}$), in a given population. While the former is determined by the immunity of an individual towards a disease, the latter depends on the duration of exposure to the infection. We model the spatio-temporal propagation of a pandemic using a generalized SIR (Susceptible-Infected-Removed) model by introducing the susceptibility and infectivity distributions to understand their combined effects, which appear to remain inadequately addressed till date. We consider the coupling between $σ_{inh}$ and $ι_{ext}$ through a new Critical Infection Parameter (CIP) ($γ_c$). We find that the neglect of these distributions, as in the naive SIR model, results in an overestimation of the amount of infection in a population, which leads to incorrect (higher) estimates of the infections required to achieve the herd immunity threshold. Additionally, we include the effects of seeding of infection in a population by long-range migration. We solve the resulting master equations by performing Kinetic Monte C
We explore the emergence of persistent infection in a patch of population, where the disease progression of the individuals is given by the SIRS model and an individual becomes infected on contact with another infected individual. We investigate the persistence of contagion qualitatively and quantitatively, under varying degrees of heterogeneity in the initial population. We observe that when the initial population is uniform, consisting of individuals at the same stage of disease progression, infection arising from a contagious seed does not persist. However when the initial population consists of randomly distributed refractory and susceptible individuals, a single source of infection can lead to sustained infection in the population, as heterogeneity facilitates the de-synchronization of the phases in the disease cycle of the individuals. We also show how the average size of the window of persistence of infection depends on the degree of heterogeneity in the initial composition of the population. In particular, we show that the infection eventually dies out when the entire initial population is susceptible, while even a few susceptibles among an heterogeneous refractory populati
Algorithms for identifying the infection states of nodes in a network are crucial for understanding and containing infections. Often, however, only a relatively small set of nodes have a known infection state. Moreover, the length of time that each node has been infected is also unknown. This missing data -- infection state of most nodes and infection time of the unobserved infected nodes -- poses a challenge to the study of real-world cascades. In this work, we develop techniques to identify the latent infected nodes in the presence of missing infection time-and-state data. Based on the likely epidemic paths predicted by the simple susceptible-infected epidemic model, we propose a measure (Infection Betweenness) for uncovering these unknown infection states. Our experimental results using machine learning algorithms show that Infection Betweenness is the most effective feature for identifying latent infected nodes.
We study the stochastic SIR epidemic model with infection-age dependent infectivity for which a measure-valued process is used to describe the ages of infection for each individual. We establish a functional law of large numbers (FLLN) and a functional central limit theorem (FCLT) for the properly scaled measure-valued processes together with the other epidemic processes to describe the evolution dynamics. In the FLLN, assuming that the hazard rate function of the infection periods is bounded and the ages at time 0 of the infections of the initially infected individuals are bounded, we obtain a PDE limit for the LLN-scaled measure-valued process, for which we characterize its solution explicitly. The PDE is linear with a boundary condition given by the unique solution to a set of Volterra-type nonlinear integral equations. In the FCLT, we obtain an SPDE for the CLT-scaled measure-valued process, driven by two independent white noises coming from the infection and recovery processes. The SPDE is also linear and coupled with the solution to a system of stochastic Volterra-type linear integral equations driven by three independent Gaussian noises, one from the random infection functio
We are interested in the geometry of the ``infection tree'' in a stochastic SIR (Susceptible-Infectious-Recovered) model, starting with a single infectious individual. This tree is constructed by drawing an edge between two individuals when one infects the other. We focus on the regime where the infectious period before recovery follows an exponential distribution with rate $1$, and infections occur at a rate $λ_{n} \sim \fracλ{n}$ where $n$ is the initial number of healthy individuals with $λ>1$. We show that provided that the infection does not quickly die out, the height of the infection tree is asymptotically $κ(λ) \log n$ as $n \rightarrow \infty$, where $κ(λ)$ is a continuous function in $λ$ that undergoes a second-order phase transition at $λ_{c}\simeq 1.8038$. Our main tools include a connection with the model of uniform attachment trees with freezing and the application of martingale techniques to control profiles of random trees.
We study a stochastic epidemic model with multiple patches (locations), where individuals in each patch are categorized into three compartments, Susceptible, Infected and Recovered/Removed, and may migrate from one patch to another in any of the compartments. Each individual is associated with a random infectivity function which dictates the force of infectivity while the interactive infection process depends on the age of infection (elapsed time since infection). We prove a functional law of large number for the epidemic evolution dynamics including the aggregate infectivity process, the numbers of susceptible and recovered individuals as well as the number of infected individuals at each time that have been infected for a certain amount of time. From the limits, we derive a PDE model for the density of the number of infected individuals with respect to the infection age, which is a systems of linear PDE equations with a boundary condition that is determined by a set of integral equations.
Human papillomavirus (HPV) infection is the most common sexually transmitted infection in the world. Persistent oncogenic Human papillomavirus infection has been a leading threat to global health and can lead to serious complications such as cervical cancer. Prevention interventions including vaccination and screening have been proved effective in reducing the risk of HPV-related diseases. In recent decades, computational epidemiology has been serving as a very useful tool to study HPV transmission dynamics and evaluation of prevention strategies. In this paper, we conduct a comprehensive literature review on state-of-the-art computational epidemic models for HPV disease dynamics, transmission dynamics, as well as prevention efforts. We summarise current research trends, identify gaps in the present literature, and identify future research directions with potential in accelerating the containment and/or elimination of HPV infection.
Laboratory models are often used to understand the interaction of related pathogens via host immunity. For example, recent experiments where ferrets were exposed to two influenza strains within a short period of time have shown how the effects of cross-immunity vary with the time between exposures and the specific strains used. On the other hand, studies of the workings of different arms of the immune response, and their relative importance, typically use experiments involving a single infection. However, inferring the relative importance of different immune components from this type of data is challenging. Using simulations and mathematical modelling, here we investigate whether the sequential infection experiment design can be used not only to determine immune components contributing to cross-protection, but also to gain insight into the immune response during a single infection. We show that virological data from sequential infection experiments can be used to accurately extract the timing and extent of cross-protection. Moreover, the broad immune components responsible for such cross-protection can be determined. Such data can also be used to infer the timing and strength of so
It is becoming increasingly important that physics educators equip their students with the skills to work with data effectively. However, many educators may lack the necessary training and expertise in data science to teach these skills. To address this gap, we created the Data Science Education Community of Practice (DSECOP), bringing together graduate students and physics educators from different institutions and backgrounds to share best practices and lessons learned from integrating data science into undergraduate physics education. In this article we present insights and experiences from this community of practice, highlighting key strategies and challenges in incorporating data science into the introductory physics curriculum. Our goal is to provide guidance and inspiration to educators who seek to integrate data science into their teaching, helping to prepare the next generation of physicists for a data-driven world.
Studying the spatiotemporal distribution of SARS-CoV-2 infections among healthcare workers (HCWs) can aid in protecting them from exposure. Existing studies related to HCW infections have emphasized infection rates and protective measures. However, the spatiotemporal patterns and related external environmental factors of HCW infections remain unclear. To fill this gap, an open-source dataset of HCW diagnoses was provided, and the spatiotemporal distributions of SARS-CoV-2 infections among HCWs in Wuhan, China were explored. A geographical detector technique was then used to investigate the impacts of hospital level, type, distance from the infection source, and other external indicators of HCW infections. The results showed that the number of daily HCW infections over time in Wuhan followed a log-normal distribution, with and its mean observed on January 23, 2020 and a standard deviation of 10.8 days. The implementation of high-impact measures, such as the lockdown of the city, may have increased the probability of HCW infections in the short term, especially for HCWs in the outer ring of Wuhan. The infection of HCWs Wuhan exhibited clear spatial heterogeneity. The number of HCW in