Land use expansion is linked to major sustainability concerns including climate change, food security and biodiversity loss. This expansion is largely concentrated in so-called frontiers, defined here as places experiencing marked transformations due to rapid resource exploitation. Understanding the mechanisms shaping these frontiers is crucial for sustainability. Previous work focused mainly on explaining how active frontiers advance, in particular into tropical forests. Comparatively, our understanding of how frontiers emerge in territories considered marginal in terms of agricultural productivity and global market integration remains weak. We synthesize conceptual tools explaining resource and land-use frontiers, including theories of land rent and agglomeration economies, of frontiers as successive waves, spaces of territorialization, friction, and opportunities, anticipation and expectation. We then propose a new theory of frontier emergence, which identifies exogenous pushes, legacies of past waves, and actors anticipations as key mechanisms by which frontiers emerge. Processes of abnormal rent creation and capture and the built-up of agglomeration economies then constitute k
Estimating, understanding, and communicating uncertainty is fundamental to statistical epidemiology, where model-based estimates regularly inform real-world decisions. However, sources of uncertainty are rarely formalised, and existing classifications are often defined inconsistently. This lack of structure hampers interpretation, model comparison, and targeted data collection. Connecting ideas from machine learning, information theory, experimental design, and health economics, we present a first-principles decision-theoretic framework that defines uncertainty as the expected loss incurred by making an estimate based on incomplete information, arguing that this is a highly useful and practically relevant definition for epidemiology. We show how reasoning about future data leads to a notion of expected uncertainty reduction, which induces formal definitions of reducible and irreducible uncertainty. We demonstrate our approach using a case study of SARS-CoV-2 wastewater surveillance in Aotearoa New Zealand, estimating the uncertainty reduction if wastewater surveillance were expanded to the full population. We then connect our framework to relevant literature from adjacent fields, s
This study presents a cluster-based Bayesian SIRD model to analyze the epidemiology of chickenpox (varicella) in India, utilizing data from 1990 to 2021. We employed an age-structured approach, dividing the population into juvenile, adult, and elderly groups, to capture the disease's transmission dynamics across diverse demographic groups. The model incorporates a Holling-type incidence function, which accounts for the saturation effect of transmission at high prevalence levels, and applies Bayesian inference to estimate key epidemiological parameters, including transmission rates, recovery rates, and mortality rates. The study further explores cluster analysis to identify regional clusters within India based on the similarities in chickenpox transmission dynamics, using criteria like incidence, prevalence, and mortality rates. We perform K-means clustering to uncover three distinct epidemiological regimes, which vary in terms of outbreak potential and age-specific dynamics. The findings highlight juveniles as the primary drivers of transmission, while the elderly face a disproportionately high mortality burden. Our results underscore the importance of age-targeted interventions an
In the age of digital epidemiology, epidemiologists are faced by an increasing amount of data of growing complexity and dimensionality. Machine learning is a set of powerful tools that can help to analyze such enormous amounts of data. This chapter lays the methodological foundations for successfully applying machine learning in epidemiology. It covers the principles of supervised and unsupervised learning and discusses the most important machine learning methods. Strategies for model evaluation and hyperparameter optimization are developed and interpretable machine learning is introduced. All these theoretical parts are accompanied by code examples in R, where an example dataset on heart disease is used throughout the chapter.
Defining interdisciplinary physics today requires first a reformulation of what is physics today, which in turn calls for clarifying what makes a physicist. This assessment results from my forty year journey arguing and fighting to build sociophysics. My view on interdisciplinary physics has thus evolved jumping repeatedly to opposite directions before settling down to the following claim: today physics is what is done by physicists who handle a problem the "physicist's way". However the training of physicists should stay restricted to inert matter. Yet adding a focus on the universality of the physicist approach as a generic path to investigate a topic. Consequently, interdisciplinary physics should become a cabinet of curiosities including an incubator. The cabinet of curiosities would welcome all one shots papers related to any kind of object provided it is co-authored at least by one physicist. Otherwise the paper should uses explicitly technics from physics. In case a topic gets many papers, it would be moved to the incubator to foster the potential emergence of a new appropriate subfield of physics. A process illustrated by the subsection social physics in Frontiers in physic
Artificial intelligence (AI) systems increasingly shape how people access health information, make medical decisions, and receive care -- yet epidemiology lacks frameworks for measuring AI exposure or studying its health effects at the population level. Here we argue that AI now functions as a determinant of health and propose a conceptual framework, borrowed from environmental epidemiology, for studying it. We distinguish ambient AI exposure -- algorithmic curation and AI-mediated institutional decisions that affect populations regardless of individual choice -- from personal AI exposure -- direct, volitional use of AI tools. We characterize AI's possible causal roles in epidemiological models, show that existing experimental approaches are inadequate for capturing chronic, population-level effects, and illustrate these ideas with nationally representative US survey data. We discuss implications for study design, health equity, and AI governance.
Maps have played an important role in epidemiology and public health since the beginnings of these disciplines. With the advent of geographical information systems and advanced information visualization techniques, interactive maps have become essential tools for the analysis of geographical patterns of disease incidence and prevalence, as well as communication of public health knowledge, as dramatically illustrated by the proliferation of web-based maps and disease surveillance ``dashboards'' during the COVID-19 pandemic. While such interactive maps are usually effective in supporting static spatial analysis, support for spatial epidemiological visualization and modelling involving distributed and dynamic data sources, and support for analysis of temporal aspects of disease spread have proved more challenging. Combining these two aspects can be crucial in applications of interactive maps in epidemiology and public health work. In this paper, we discuss these issues in the context of support for disease surveillance in remote regions, including tools for distributed data collection, simulation and analysis, and enabling multidisciplinary collaboration.
Individuals infected with SARS-CoV-2, the virus that causes COVID-19, may shed the virus in stool before developing symptoms, suggesting that measurements of SARS-CoV-2 concentrations in wastewater could be a "leading indicator" of COVID-19 prevalence. Multiple studies have corroborated the leading indicator concept by showing that the correlation between wastewater measurements and COVID-19 case counts is maximized when case counts are lagged. However, the meaning of "leading indicator" will depend on the specific application of wastewater-based epidemiology, and the correlation analysis is not relevant for all applications. In fact, the quantification of a leading indicator will depend on epidemiological, biological, and health systems factors. Thus, there is no single "lead time" for wastewater-based COVID-19 monitoring. To illustrate this complexity, we enumerate three different applications of wastewater-based epidemiology for COVID-19: a qualitative "early warning" system; an independent, quantitative estimate of disease prevalence; and a quantitative alert of bursts of disease incidence. The leading indicator concept has different definitions and utility in each application.
The science of networks has revolutionised research into the dynamics of interacting elements. It could be argued that epidemiology in particular has embraced the potential of network theory more than any other discipline. Here we review the growing body of research concerning the spread of infectious diseases on networks, focusing on the interplay between network theory and epidemiology. The review is split into four main sections, which examine: the types of network relevant to epidemiology; the multitude of ways these networks can be characterised; the statistical methods that can be applied to infer the epidemiological parameters on a realised network; and finally simulation and analytical methods to determine epidemic dynamics on a given network. Given the breadth of areas covered and the ever-expanding number of publications, a comprehensive review of all work is impossible. Instead, we provide a personalised overview into the areas of network epidemiology that have seen the greatest progress in recent years or have the greatest potential to provide novel insights. As such, considerable importance is placed on analytical approaches and statistical methods which are both rapid
A principal screens an agent with an arbitrary set of allocations $X$. The agent's preferences over allocations are comonotonic. A subset of allocations $X^*\subseteq X$ is a surplus-elasticity frontier if (i) any other allocation has a demand curve that is pointwise lower and less elastic than some allocation in $X^*$ and (ii) the allocations in $X^*$ can be ordered in terms of their demand curves such that a higher demand curve is more inelastic. We show that any surplus-elasticity frontier is an optimal menu. Moreover, if the incremental demand curves along the frontier are also ordered by their elasticities, then the frontier is optimal even among stochastic mechanisms. The result is agnostic to type distributions and redistributive welfare weights -- the same frontier remains optimal for a broad class of objectives. As applications, we show how these results immediately yield new insights into optimal bundling, optimal taxation, sequential screening, selling information, and regulating a data-rich monopolist.
World models have emerged as a unifying paradigm for learning latent dynamics, simulating counterfactual futures, and supporting planning under uncertainty. In this paper, we argue that computational epidemiology is a natural and underdeveloped setting for world models. This is because epidemic decision-making requires reasoning about latent disease burden, imperfect and policy-dependent surveillance signals, and intervention effects are mediated by adaptive human behavior. We introduce a conceptual framework for epidemiological world models, formulating epidemics as controlled, partially observed dynamical systems in which (i) the true epidemic state is latent, (ii) observations are noisy and endogenous to policy, and (iii) interventions act as sequential actions whose effects propagate through behavioral and social feedback. We present three case studies that illustrate why explicit world modeling is necessary for policy-relevant reasoning: strategic misreporting in behavioral surveillance, systematic delays in time-lagged signals such as hospitalizations and deaths, and counterfactual intervention analysis where identical histories diverge under alternative action sequences.
Benchmarking the performance of complex systems such as rail networks, renewable generation assets and national economies is central to transport planning, regulation and macroeconomic analysis. Classical frontier methods, notably Data Envelopment Analysis (DEA) and Stochastic Frontier Analysis (SFA), estimate an efficient frontier in the observed input-output space and define efficiency as distance to this frontier, but rely on restrictive assumptions on the production set and only indirectly address heterogeneity and scale effects. We propose Geometric Manifold Analysis (GeMA), a latent manifold frontier framework implemented via a productivity-manifold variational autoencoder (ProMan-VAE). Instead of specifying a frontier function in the observed space, GeMA represents the production set as the boundary of a low-dimensional manifold embedded in the joint input-output space. A split-head encoder learns latent variables that capture technological structure and operational inefficiency. Efficiency is evaluated with respect to the learned manifold, endogenous peer groups arise as clusters in latent technology space, a quotient construction supports scale-invariant benchmarking, and
Recent advances in artificial intelligence (AI) - particularly generative AI - present new opportunities to accelerate, or even automate, epidemiological research. Unlike disciplines based on physical experimentation, a sizable fraction of Epidemiology relies on secondary data analysis and thus is well-suited for such augmentation. Yet, it remains unclear which specific tasks can benefit from AI interventions or where roadblocks exist. Awareness of current AI capabilities is also mixed. Here, we map the landscape of epidemiological tasks using existing datasets - from literature review to data access, analysis, writing up, and dissemination - and identify where existing AI tools offer efficiency gains. While AI can increase productivity in some areas such as coding and administrative tasks, its utility is constrained by limitations of existing AI models (e.g. hallucinations in literature reviews) and human systems (e.g. barriers to accessing datasets). Through examples of AI-generated epidemiological outputs, including fully AI-generated papers, we demonstrate that recently developed agentic systems can now design and execute epidemiological analysis, albeit to varied quality (see
Modelling epidemics via classical population-based models suffers from shortcomings that so-called individual-based models are able to overcome, as they are able to take heterogeneity features into account, such as super-spreaders, and describe the dynamics involved in small clusters. In return, such models often involve large graphs which are expensive to simulate and difficult to optimize, both in theory and in practice. By combining the reinforcement learning philosophy with reduced models, we propose a numerical approach to determine optimal health policies for a stochastic epidemiological graph-model taking into account super-spreaders. More precisely, we introduce a deterministic reduced population-based model involving a neural network, and use it to derive optimal health policies through an optimal control approach. It is meant to faithfully mimic the local dynamics of the original, more complex, graph-model. Roughly speaking, this is achieved by sequentially training the network until an optimal control strategy for the corresponding reduced model manages to equally well contain the epidemic when simulated on the graph-model. After describing the practical implementation o
Systematic literature reviews (SLRs) are a demanding and high-stakes form of scientific knowledge synthesis that remains underspecified as an evaluation setting for large language models (LLMs). We introduce AgentSLR, a large-scale evaluation harness comprising an SLR automation workflow and an expert annotated dataset covering 16,248 articles, designed to test LLM capabilities across the stages of SLRs in epidemiology. Reference annotations were derived from peer-reviewed studies on WHO priority pathogens and produced by domain experts. The harness evaluates each review stage as a separate unit with dedicated metrics enabling targeted failure analysis. We evaluated five frontier reasoning models and found that no single model dominated across all tasks, showing sub-task specialisation often hidden by aggregate benchmarks. Structured data extraction is a major bottleneck, with no model exceeding an average field-level F1 of 0.67. Estimated costs vary substantially, by up to 96 times across evaluated models. Documented failure modes suggest that the evaluated models are not yet reliable enough for unsupervised deployment in epidemiology, where findings can inform public policy.
Connecting the different scales of epidemic dynamics, from individuals to communities to nations, remains one of the main challenges of disease modeling. Here, we revisit one of the largest public health efforts deployed against a localized epidemic: the 2014-2016 Ebola Virus Disease (EVD) epidemic in Sierra Leone. We leverage the data collected by the surveillance and contact tracing protocols of the Sierra Leone Ministry of Health and Sanitation, the US Centers for Disease Control and Prevention, and other responding partners to validate a network epidemiology framework connecting the population (incidence), community (local forecasts), and individual (secondary infections) scales of disease transmission. In doing so, we gain a better understanding of what brought the EVD epidemic to an end: Reduction of introduction in new clusters (primary cases), and not reduction in local transmission patterns (secondary infections). We also find that the first 90 days of the epidemic contained enough information to produce probabilistic forecasts of EVD cases; forecasts which we show are confirmed independently by both disease surveillance and contact tracing. Altogether, using data availabl
Reproducibility and replicability of research findings are central to the scientific integrity of epidemiology. In addition, many research questions require combiningdata from multiple sources to achieve adequate statistical power. However, barriers related to confidentiality, costs, and incentives often limit the extent and speed of sharing resources, both data and code. Epidemiological practices that follow FAIR principles can address these barriers by making resources (F)indable with the necessary metadata , (A)ccessible to authorized users and (I)nteroperable with other data, to optimize the (R)e-use of resources with appropriate credit to its creators. We provide an overview of these principles and describe approaches for implementation in epidemiology. Increasing degrees of FAIRness can be achieved by moving data and code from on-site locations to the Cloud, using machine-readable and non-proprietary files, and developing open-source code. Adoption of these practices will improve daily work and collaborative analyses, and facilitate compliance with data sharing policies from funders and scientific journals. Achieving a high degree of FAIRness will require funding, training, o
Defining the effect of exposure of interest and selecting an appropriate estimation method are prerequisite for causal inference. Understanding the ways in which association between heatwaves (i.e., consecutive days of extreme high temperature) and an outcome depends on whether adjustment was made for temperature and how such adjustment was conducted, is limited. This paper aims to investigate this dependency, demonstrate that temperature is a confounder in heatwave-outcome associations, and introduce a new modeling approach to estimate a new heatwave-outcome relation: E[R(Y)|HW=1, Z]/E[R(Y)|T=OT, Z], where HW is a daily binary variable to indicate the presence of a heatwave; R(Y) is the risk of an outcome, Y; T is a temperature variable; OT is optimal temperature; and Z is a set of confounders including typical confounders but also some types of T as a confounder. We recommend characterization of heatwave-outcome relations and careful selection of modeling approaches to understand the impacts of heatwaves under climate change. We demonstrate our approach using real-world data for Seoul, which suggests that the total effect of heatwaves may be larger than what may be inferred from
Epidemiology models are central in understanding and controlling large scale pandemics. Several epidemiology models require simulation-based inference such as Approximate Bayesian Computation (ABC) to fit their parameters to observations. ABC inference is highly amenable to efficient hardware acceleration. In this work, we develop parallel ABC inference of a stochastic epidemiology model for COVID-19. The statistical inference framework is implemented and compared on Intel Xeon CPU, NVIDIA Tesla V100 GPU and the Graphcore Mk1 IPU, and the results are discussed in the context of their computational architectures. Results show that GPUs are 4x and IPUs are 30x faster than Xeon CPUs. Extensive performance analysis indicates that the difference between IPU and GPU can be attributed to higher communication bandwidth, closeness of memory to compute, and higher compute power in the IPU. The proposed framework scales across 16 IPUs, with scaling overhead not exceeding 8% for the experiments performed. We present an example of our framework in practice, performing inference on the epidemiology model across three countries, and giving a brief overview of the results.
The aim of this work was to show few examples and few perspective of modeling in epidemiology. We began with differential equations which were a first tool to describe and predict that phenomena. Wroclaw as a cite was very important, because statistics from smallpox epidemic were used by Bernoulli to estimate parameters of first mathematical model of epidemic. Next step were SIR models and those also appeared first as differential equations. They were very popular in begin of XX century. When computer simulation changed the world of mathematical modeling agent-based models gave more possibilities in epidemiology. That models have a big privilege on differential equation, because of information of social network people habits and reaction on infections, which can me involved in agent-based models as well as governmental intervention. We showed in this work how that human relations are important in transmitting diseases and there is example, where it is possible to conduct experiments of significant policy relevance (vaccinating), such as investigating the initial growth of an epidemic on a real-world network. Presented H1N1 model could be observed in real time (prediction was made i