The electric power sector is a leading source of air pollutant emissions, impacting the public health of nearly every community. Although regulatory measures have reduced air pollutants, fossil fuels remain a significant component of the energy supply, highlighting the need for more advanced demand-side approaches to reduce the public health impacts. To enable health-informed demand-side management, we introduce HealthPredictor, a domain-specific AI model that provides an end-to-end pipeline linking electricity use to public health outcomes. The model comprises three components: a fuel mix predictor that estimates the contribution of different generation sources, an air quality converter that models pollutant emissions and atmospheric dispersion, and a health impact assessor that translates resulting pollutant changes into monetized health damages. Across multiple regions in the United States, our health-driven optimization framework yields substantially lower prediction errors in terms of public health impacts than fuel mix-driven baselines. A case study on electric vehicle charging schedules illustrates the public health gains enabled by our method and the actionable guidance it
Artificial Intelligence (AI) is revolutionizing various fields, including public health surveillance. In Africa, where health systems frequently encounter challenges such as limited resources, inadequate infrastructure, failed health information systems and a shortage of skilled health professionals, AI offers a transformative opportunity. This paper investigates the applications of AI in public health surveillance across the continent, presenting successful case studies and examining the benefits, opportunities, and challenges of implementing AI technologies in African healthcare settings. Our paper highlights AI's potential to enhance disease monitoring and health outcomes, and support effective public health interventions. The findings presented in the paper demonstrate that AI can significantly improve the accuracy and timeliness of disease detection and prediction, optimize resource allocation, and facilitate targeted public health strategies. Additionally, our paper identified key barriers to the widespread adoption of AI in African public health systems and proposed actionable recommendations to overcome these challenges.
Rapidly evolving technology, data and analytic landscapes are permeating many fields and professions. In public health, the need for data science skills including data literacy is particularly prominent given both the potential of novel data types and analysis methods to fill gaps in existing public health research and intervention practices, as well as the potential of such data or methods to perpetuate or augment health disparities. Through a review of public health courses and programs at the top 10 U.S. and globally ranked schools of public health, this article summarizes existing educational efforts in public health data science. These existing practices serve to inform efforts for broadening such curricula to further schools and populations. Data science ethics course offerings are also examined in context of assessing how population health principles can be blended into training across levels of data involvement to augment the traditional core of public health curricula. Parallel findings from domestic and international 'outside the classroom' training programs are also synthesized to advance approaches for increasing diversity in public health data science. Based on these p
Quantum technologies, including quantum computing, cryptography, and sensing, among others, are set to revolutionize sectors ranging from materials science to drug discovery. Despite their significant potential, the implications for public health have been largely overlooked, highlighting a critical gap in recognition and preparation. This oversight necessitates immediate action, as public health remains largely unaware of quantum technologies as a tool for advancement. The application of quantum principles to epidemiology and health informatics, termed quantum health epidemiology and quantum health informatics, has the potential to radically transform disease surveillance, prediction, modeling, and analysis of health data. However, there is a notable lack of quantum expertise within the public health workforce and educational pipelines. This gap underscores the urgent need for the development of quantum literacy among public health practitioners, leaders, and students to leverage emerging opportunities while addressing risks and ethical considerations. Innovative teaching methods, such as interactive simulations, games, visual models, and other tailored platforms, offer viable sol
Irregularities in public health data streams (like COVID-19 Cases) hamper data-driven decision-making for public health stakeholders. A real-time, computer-generated list of the most important, outlying data points from thousands of daily-updated public health data streams could assist an expert reviewer in identifying these irregularities. However, existing outlier detection frameworks perform poorly on this task because they do not account for the data volume or for the statistical properties of public health streams. Accordingly, we developed FlaSH (Flagging Streams in public Health), a practical outlier detection framework for public health data users that uses simple, scalable models to capture these statistical properties explicitly. In an experiment where human experts evaluate FlaSH and existing methods (including deep learning approaches), FlaSH scales to the data volume of this task, matches or exceeds these other methods in mean accuracy, and identifies the outlier points that users empirically rate as more helpful. Based on these results, FlaSH has been deployed on data streams used by public health stakeholders.
A recent rise in online content expressing concerns with public health initiatives has contributed to already stalled uptake of preemptive measures globally. Future public health efforts must attempt to understand such content, what concerns it may raise among readers, and how to effectively respond to it. To this end, we present ConcernScope, a platform that uses a teacher-student framework for knowledge transfer between large language models and light-weight classifiers to quickly and effectively identify the health concerns raised in a text corpus. The platform allows uploading massive files directly, automatically scraping specific URLs, and direct text editing. ConcernScope is built on top of a taxonomy of public health concerns. Intended for public health officials, we demonstrate several applications of this platform: guided data exploration to find useful examples of common concerns found in online community datasets, identification of trends in concerns through an example time series analysis of 186,000 samples, and finding trends in topic frequency before and after significant events.
We present a technical case study on the Privacy-Enhancing Technologies (PETs) for Public Health Challenge, a collaborative effort to safely leverage sensitive private sector data for social impact, specifically pandemic management. The project utilized Differential Privacy (DP) to create realistic, privacy-preserved synthetic financial transaction data, which was then combined with public health and mobility datasets. This approach successfully addressed the critical hurdle of sharing sensitive financial information for research and policy. The analysis demonstrated that this synthetic, DP-protected data possesses significant spatial-temporal and predictive power for public health. Key outcomes include the development of six reusable tools and frameworks supporting diagnostic nowcasting (e.g., Hotspot Detection, Pandemic Adherence Monitoring) and predictive forecasting (e.g., Mobility Analysis, Contact Matrix Estimation) for epidemiological decision-making. The study provides best practices for advancing data sharing in a privacy-compliant manner.
Public health experts need scalable approaches to monitor large volumes of health data (e.g., cases, hospitalizations, deaths) for outbreaks or data quality issues. Traditional alert-based monitoring systems struggle with modern public health data monitoring systems for several reasons, including that alerting thresholds need to be constantly reset and the data volumes may cause application lag. Instead, we propose a ranking-based monitoring paradigm that leverages new AI anomaly detection methods. Through a multi-year interdisciplinary collaboration, the resulting system has been deployed at a national organization to monitor up to 5,000,000 data points daily. A three-month longitudinal deployed evaluation revealed a significant improvement in monitoring objectives, with a 54x increase in reviewer speed efficiency compared to traditional alert-based methods. This work highlights the potential of human-centered AI to transform public health decision-making.
The surging demand for AI has led to a rapid expansion of energy-intensive data centers, impacting the environment through escalating carbon emissions and water consumption. While significant attention has been paid to data centers' growing environmental footprint, the public health burden, a hidden toll of data centers, has been largely overlooked. Specifically, data centers' lifecycle, from chip manufacturing to operation, can significantly degrade air quality through emissions of criteria air pollutants such as fine particulate matter, substantially impacting public health. This paper introduces a principled methodology to model lifecycle pollutant emissions for data centers and computing tasks, quantifying the public health impacts. Our findings reveal that training a large AI model comparable to the Llama-3.1 scale can produce air pollutants equivalent to more than 10,000 round trips by car between Los Angeles and New York City. The growing demand for AI is projected to push the total annual public health burden of U.S. data centers up to more than $20 billion in 2028, rivaling that of on-road emissions of California. Further, the public health costs are more felt in disadvant
Large Language Models (LLMs) hold promise in addressing complex medical problems. However, while most prior studies focus on improving accuracy and reasoning abilities, a significant bottleneck in developing effective healthcare agents lies in the readability of LLM-generated responses, specifically, their ability to answer public health problems clearly and simply to people without medical backgrounds. In this work, we introduce RephQA, a benchmark for evaluating the readability of LLMs in public health question answering (QA). It contains 533 expert-reviewed QA pairs from 27 sources across 13 topics, and includes a proxy multiple-choice task to assess informativeness, along with two readability metrics: Flesch-Kincaid grade level and professional score. Evaluation of 25 LLMs reveals that most fail to meet readability standards, highlighting a gap between reasoning and effective communication. To address this, we explore four readability-enhancing strategies-standard prompting, chain-of-thought prompting, Group Relative Policy Optimization (GRPO), and a token-adapted variant. Token-adapted GRPO achieves the best results, advancing the development of more practical and user-friendl
For a long time, public health events, such as disease incidence or vaccination activity, have been monitored to keep track of the health status of the population, allowing to evaluate the effect of public health initiatives and to decide where resources for improving public health are best spent. This thesis investigates the use of web data mining for public health monitoring, and makes contributions in the following two areas: New approaches for predicting public health events from web mined data, and novel applications of web mined data for public health monitoring.
Public health is the most recent of the biomedical sciences to be seduced by the trendy moniker "precision." Advocates for "precision public health" (PPH) call for a data-driven, computational approach to public health, leveraging swaths of genomic "big data" to inform public health decision-making. Yet, like precision medicine, PPH oversells the value of genomic data to determine health outcomes, but on a population-level. A large historical literature has shown that over-emphasizing heredity tends to disproportionately harm underserved minorities and disadvantaged communities. By comparing and contrasting PPH with an earlier attempt at using big data and genetics, in the Progressive era (1890-1920), we highlight some potential risks of a genotype-driven preventive public health. We conclude by suggesting that such risks may be avoided by prioritizing data integration across many levels of analysis, from the molecular to the social.
Time elapsed till an event of interest is often modeled using the survival analysis methodology, which estimates a survival score based on the input features. There is a resurgence of interest in developing more accurate prediction models for time-to-event prediction in personalized healthcare using modern tools such as neural networks. Higher quality features and more frequent observations improve the predictions for a patient, however, the impact of including a patient's geographic location-based public health statistics on individual predictions has not been studied. This paper proposes a complementary improvement to survival analysis models by incorporating public health statistics in the input features. We show that including geographic location-based public health information results in a statistically significant improvement in the concordance index evaluated on the Surveillance, Epidemiology, and End Results (SEER) dataset containing nationwide cancer incidence data. The improvement holds for both the standard Cox proportional hazards model and the state-of-the-art Deep Survival Machines model. Our results indicate the utility of geographic location-based public health feat
The COVID-19 pandemic has highlighted the dire necessity to improve public health literacy for societal resilience. YouTube, the largest video-sharing social media platform, provides a vast repository of user-generated health information in a multi-media-rich format which may be easier for the public to understand and use if major concerns about content quality and accuracy are addressed. This study develops an automated solution to identify, retrieve and shortlist medically relevant and understandable YouTube videos that domain experts can subsequently review and recommend for disseminating and educating the public on the COVID-19 pandemic and similar public health outbreaks. Our approach leverages domain knowledge from human experts and machine learning and natural language processing methods to provide a scalable, replicable, and generalizable approach that can also be applied to enhance the management of many health conditions.
YouTube has rapidly emerged as a predominant platform for content consumption, effectively displacing conventional media such as television and news outlets. A part of the enormous video stream uploaded to this platform includes health-related content, both from official public health organizations, and from any individual or group that can make an account. The quality of information available on YouTube is a critical point of public health safety, especially when concerning major interventions, such as vaccination. This study differentiates itself from previous efforts of auditing YouTube videos on this topic by conducting a systematic daily collection of posted videos mentioning vaccination for the duration of 3 months. We show that the competition for the public's attention is between public health messaging by institutions and individual educators on one side, and commentators on society and politics on the other, the latest contributing the most to the videos expressing stances against vaccination. Videos opposing vaccination are more likely to mention politicians and publication media such as podcasts, reports, and news analysis, on the other hand, videos in favor are more li
Mobile health has the potential to revolutionize health care delivery and patient engagement. In this work, we discuss how integrating Artificial Intelligence into digital health applications-focused on supply chain, patient management, and capacity building, among other use cases-can improve the health system and public health performance. We present an Artificial Intelligence and Reinforcement Learning platform that allows the delivery of adaptive interventions whose impact can be optimized through experimentation and real-time monitoring. The system can integrate multiple data sources and digital health applications. The flexibility of this platform to connect to various mobile health applications and digital devices and send personalized recommendations based on past data and predictions can significantly improve the impact of digital tools on health system outcomes. The potential for resource-poor settings, where the impact of this approach on health outcomes could be more decisive, is discussed specifically. This framework is, however, similarly applicable to improving efficiency in health systems where scarcity is not an issue.
In a recent series of high impact public health publications, the c-index was used as measure of prediction to assess the public health relevance of a risk factor. I demonstrate that the c-index is an inferior measure as compared to the classical epidemiologic measures most commonly employed for risk prediction and public health assessment such as disease incidence, relative risk (RR) and population-attributable risk (PAR). I recommend using the latter measures when assessing the public health relevance of a risk factor.
Electronic Health Record (EHR) has become an essential tool in the healthcare ecosystem, providing authorized clinicians with patients' health-related information for better treatment. While most developed countries are taking advantage of EHRs to improve their healthcare system, it remains challenging in developing countries to support clinical decision-making and public health using a computerized patient healthcare information system. This paper proposes a novel EHR architecture suitable for developing countries--an architecture that fosters inclusion and provides solutions tailored to all social classes and socioeconomic statuses. Our architecture foresees an internet-free (offline) solution to allow medical transactions between healthcare organizations, and the storage of EHRs in geographically underserved and rural areas. Moreover, we discuss how artificial intelligence can leverage anonymous health-related information to enable better public health policy and surveillance.
The rapid spread of health misinformation on online social networks (OSNs) during global crises such as the COVID-19 pandemic poses challenges to public health, social stability, and institutional trust. Centrality metrics have long been pivotal in understanding the dynamics of information flow, particularly in the context of health misinformation. However, the increasing complexity and dynamism of online networks, especially during crises, highlight the limitations of these traditional approaches. This study introduces and compares three novel centrality metrics: dynamic influence centrality (DIC), health misinformation vulnerability centrality (MVC), and propagation centrality (PC). These metrics incorporate temporal dynamics, susceptibility, and multilayered network interactions. Using the FibVID dataset, we compared traditional and novel metrics to identify influential nodes, propagation pathways, and misinformation influencers. Traditional metrics identified 29 influential nodes, while the new metrics uncovered 24 unique nodes, resulting in 42 combined nodes, an increase of 44.83%. Baseline interventions reduced health misinformation by 50%, while incorporating the new metrics
Interactive Health (IH) research increasingly engages patients through participatory and user-centred approaches. However, patients' lived experiences are typically treated more as data to be analysed than as knowledge in their own right. In this paper, I argue that 'patient voice' in the field of IH is both an inclusion issue and an epistemic one. More specifically, it concerns how experiential accounts are recognised and circulated. I examine how methodological conventions, authorship norms, review criteria, and publication formats tend to position patients as participants rather than as authors of evidence. Looking to patient-partnered practices in medical publishing, including The BMJ, JAMA, and British Journal of Sports Medicine, I outline a possible infrastructural pathway for supporting patient-authored or patient-led experiential contributions within the field. I present this as a design probe to surface assumptions and trade-offs. I end this paper by inviting the IH community to reflect on how its knowledge infrastructures might accommodate experiential evidence alongside established research forms.