Using state-level opioid overdose mortality data from 1999-2016, we simulated four time-varying treatment scenarios, which correspond to real-world policy dynamics (ramp up, ramp down, temporary and inconsistent). We then evaluated seven commonly used policy evaluation methods: two-way fixed effects event study, debiased autoregressive model, augmented synthetic control, difference-in-differences with staggered adoption, event study with heterogeneous treatment, two-stage differences-in-differences and differences-in-differences imputation. Statistical performance was assessed by comparing bias, standard errors, coverage, and root mean squared error over 1,000 simulations. Results Our findings indicate that estimator performance varied across policy scenarios. In settings where policy effectiveness diminished over time, synthetic control methods recovered effects with lower bias and higher variance. Difference-in-difference approaches, while offering reasonable coverage under some scenarios, struggled when effects were non-monotonic. Autoregressive methods, although demonstrating lower variability, underestimated uncertainty. Overall, a clear bias-variance tradeoff emerged, undersc
The opioid overdose epidemic remains a critical public health crisis, particularly in the United States, leading to significant mortality and societal costs. Social media platforms like Reddit provide vast amounts of unstructured data that offer insights into public perceptions, discussions, and experiences related to opioid use. This study leverages Natural Language Processing (NLP), specifically Opioid Named Entity Recognition (ONER-2025), to extract actionable information from these platforms. Our research makes four key contributions. First, we created a unique, manually annotated dataset sourced from Reddit, where users share self-reported experiences of opioid use via different administration routes. This dataset contains 331,285 tokens and includes eight major opioid entity categories. Second, we detail our annotation process and guidelines while discussing the challenges of labeling the ONER-2025 dataset. Third, we analyze key linguistic challenges, including slang, ambiguity, fragmented sentences, and emotionally charged language, in opioid discussions. Fourth, we propose a real-time monitoring system to process streaming data from social media, healthcare records, and eme
Understanding the prevalence of misinformation in health topics online can inform public health policies and interventions. However, measuring such misinformation at scale remains a challenge, particularly for high-stakes but understudied topics like opioid-use disorder (OUD)--a leading cause of death in the U.S. We present the first large-scale study of OUD-related myths on YouTube, a widely-used platform for health information. With clinical experts, we validate 8 pervasive myths and release an expert-labeled video dataset. To scale labeling, we introduce MythTriage, an efficient triage pipeline that uses a lightweight model for routine cases and defers harder ones to a high-performing, but costlier, large language model (LLM). MythTriage achieves up to 0.86 macro F1-score while estimated to reduce annotation time and financial cost by over 76% compared to experts and full LLM labeling. We analyze 2.9K search results and 343K recommendations, uncovering how myths persist on YouTube and offering actionable insights for public health and platform moderation.
Despite numerous applications for fine-grained corpus analysis, researchers continue to rely on manual labeling, which does not scale, or statistical tools like topic modeling, which are difficult to control. We propose that LLMs have the potential to scale the nuanced analyses that researchers typically conduct manually to large text corpora. To this effect, inspired by qualitative research methods, we develop HICode, a two-part pipeline that first inductively generates labels directly from analysis data and then hierarchically clusters them to surface emergent themes. We validate this approach across three diverse datasets by measuring alignment with human-constructed themes and demonstrating its robustness through automated and human evaluations. Finally, we conduct a case study of litigation documents related to the ongoing opioid crisis in the U.S., revealing aggressive marketing strategies employed by pharmaceutical companies and demonstrating HICode's potential for facilitating nuanced analyses in large-scale data.
Approximately one-third of adults search the internet for health information before visiting an emergency department (ED), with 75% encountering inaccurate content. This study examines how such searches influence patient care. We conducted an observational study of ED visits over a 12-month period, surveying 214 of 576 patients about pre-ED internet use. Data on demographics, comorbidities, acuity, orders, prescriptions, and dispositions were extracted. Patients who searched were typically younger, healthier, and more educated. Most used a general search engine to ask symptom-related questions. Compared to non-searchers, they were less likely to receive lab tests (RR 0.78, p=0.053), imaging (RR 0.75, p=0.094), medications (RR 0.67, p=0.038), or admission (RR 0.68, p=0.175). They were more likely to leave against medical advice (RR 1.67, p=0.067) and receive opioids (RR 1.56, p=0.151). Findings suggest inaccurate health information may contribute to mismatched expectations and altered care delivery.
Background: Aspiration, the inhalation of foreign material into the lungs, significantly impacts surgical patient morbidity and mortality. This study develops a machine learning (ML) model to predict postoperative aspiration, enabling timely preventative interventions. Methods: From the MIMIC-IV database of over 400,000 hospital admissions, we identified 826 surgical patients (mean age: 62, 55.7\% male) who experienced aspiration within seven days post-surgery, along with a matched non-aspiration cohort. Three ML models: XGBoost, Multilayer Perceptron, and Random Forest were trained using pre-surgical hospitalization data to predict postoperative aspiration. To investigate causation, we estimated Average Treatment Effects (ATE) using Augmented Inverse Probability Weighting. Results: Our ML model achieved an AUROC of 0.86 and 77.3\% sensitivity on a held-out test set. Maximum daily opioid dose, length of stay, and patient age emerged as the most important predictors. ATE analysis identified significant causative factors: opioids (0.25 +/- 0.06) and operative site (neck: 0.20 +/- 0.13, head: 0.19 +/- 0.13). Despite equal surgery rates across genders, men were 1.5 times more likely to
In many fields, populations of interest are hidden from data for a variety of reasons, though their magnitude remains important in determining resource allocation and appropriate policy. In public health and epidemiology, linkages or relationships between sources of data may exist due to intake structure of care providers, referrals, or other related health programming. These relationships often admit a tree structure, with the target population represented by the root, and paths from root-to-leaf representing pathways of care after a health event. In the Canadian province of British Columbia (BC), significant efforts have been made in creating an opioid overdose cohort, a tree-like linked data structure which tracks the movement of individuals along pathways of care after an overdose. In this application, the root node represents the target population, the total number of overdose events occurring in BC during the specified time period. We compare and contrast two methods of estimating the target population size - a weighted multiplier method based on back-calculating estimates from a number of paths and combining these estimates via a variance-minimizing weighted mean, and a full
Substance use disorders (SUDs) are a growing concern globally, necessitating enhanced understanding of the problem and its trends through data-driven research. Social media are unique and important sources of information about SUDs, particularly since the data in such sources are often generated by people with lived experiences. In this paper, we introduce Reddit-Impacts, a challenging Named Entity Recognition (NER) dataset curated from subreddits dedicated to discussions on prescription and illicit opioids, as well as medications for opioid use disorder. The dataset specifically concentrates on the lesser-studied, yet critically important, aspects of substance use--its clinical and social impacts. We collected data from chosen subreddits using the publicly available Application Programming Interface for Reddit. We manually annotated text spans representing clinical and social impacts reported by people who also reported personal nonmedical use of substances including but not limited to opioids, stimulants and benzodiazepines. Our objective is to create a resource that can enable the development of systems that can automatically detect clinical and social impacts of substance use f
Background: One of the key FDA-approved medications for Opioid Use Disorder (OUD) is buprenorphine. Despite its popularity, individuals often report various information needs regarding buprenorphine treatment on social media platforms like Reddit. However, the key challenge is to characterize these needs. In this study, we propose a theme-based framework to curate and analyze large-scale data from social media to characterize self-reported treatment information needs (TINs). Methods: We collected 15,253 posts from r/Suboxone, one of the largest Reddit sub-community for buprenorphine products. Following the standard protocol, we first identified and defined five main themes from the data and then coded 6,000 posts based on these themes, where one post can be labeled with applicable one to three themes. Finally, we determined the most frequently appearing sub-themes (topics) for each theme by analyzing samples from each group. Results: Among the 6,000 posts, 40.3% contained a single theme, 36% two themes, and 13.9% three themes. The most frequent topics for each theme or theme combination came with several key findings - prevalent reporting of psychological and physical effects durin
Background: Electronic health records (EHRs) are a data source for opioid research. Opioid use disorder is known to be under-coded as a diagnosis, yet problematic opioid use can be documented in clinical notes. Objectives: Our goals were 1) to identify problematic opioid use from a full range of clinical notes; and 2) to compare the characteristics of patients identified as having problematic opioid use, exclusively documented in clinical notes, to those having documented ICD opioid use disorder diagnostic codes. Materials and Methods: We developed and applied a natural language processing (NLP) tool to the clinical notes of a patient cohort (n=222,371) from two Veteran Affairs service regions to identify patients with problematic opioid use. We also used a set of ICD diagnostic codes to identify patients with opioid use disorder from the same cohort. We compared the demographic and clinical characteristics of patients identified only through NLP, to those of patients identified through ICD codes. Results: NLP exclusively identified 57,331 patients; 6,997 patients had positive ICD code identifications. Patients exclusively identified through NLP were more likely to be women. Those
The opioid crisis remains one of the most daunting and complex public health problems in the United States. This study investigates the national epidemic by analyzing vulnerability profiles of three key factors: opioid-related mortality rates, opioid prescription dispensing rates, and disability rank ordered rates. This study utilizes county level data, spanning the years 2014 through 2020, on the rates of opioid-related mortality, opioid prescription dispensing, and disability. To successfully estimate and predict trends in these opioid-related factors, we augment the Kalman Filter with a novel spatial component. To define opioid vulnerability profiles, we create heat maps of our filter's predicted rates across the nation's counties and identify the hotspots. In this context, hotspots are defined on a year-by-year basis as counties with rates in the top 5 percent nationally. Our spatial Kalman filter demonstrates strong predictive performance. From 2014 to 2018, these predictions highlight consistent spatiotemporal patterns across all three factors, with Appalachia distinguished as the nation's most vulnerable region. Starting in 2019 however, the dispensing rate profiles undergo
Opioid Use Disorder (OUD) has emerged as a significant global public health issue, with complex multifaceted conditions. Due to the lack of effective treatment options for various conditions, there is a pressing need for the discovery of new medications. In this study, we propose a deep generative model that combines a stochastic differential equation (SDE)-based diffusion modeling with the latent space of a pretrained autoencoder model. The molecular generator enables efficient generation of molecules that are effective on multiple targets, specifically the mu, kappa, and delta opioid receptors. Furthermore, we assess the ADMET (absorption, distribution, metabolism, excretion, and toxicity) properties of the generated molecules to identify drug-like compounds. To enhance the pharmacokinetic properties of some lead compounds, we employ a molecular optimization approach. We obtain a diverse set of drug-like molecules. We construct binding affinity predictors by integrating molecular fingerprints derived from autoencoder embeddings, transformer embeddings, and topological Laplacians with advanced machine learning algorithms. Further experimental studies are needed to evaluate the pha
The escalating drug addiction crisis in the United States underscores the urgent need for innovative therapeutic strategies. This study embarked on an innovative and rigorous strategy to unearth potential drug repurposing candidates for opioid and cocaine addiction treatment, bridging the gap between transcriptomic data analysis and drug discovery. We initiated our approach by conducting differential gene expression analysis on addiction-related transcriptomic data to identify key genes. We propose a novel topological differentiation to identify key genes from a protein-protein interaction (PPI) network derived from DEGs. This method utilizes persistent Laplacians to accurately single out pivotal nodes within the network, conducting this analysis in a multiscale manner to ensure high reliability. Through rigorous literature validation, pathway analysis, and data-availability scrutiny, we identified three pivotal molecular targets, mTOR, mGluR5, and NMDAR, for drug repurposing from DrugBank. We crafted machine learning models employing two natural language processing (NLP)-based embeddings and a traditional 2D fingerprint, which demonstrated robust predictive ability in gauging bind
Neurotensin (NT) exerts naloxone-insensitive antinociceptive action through its binding to both NTS1 and NTS2 receptors and NT analogs provide stronger pain relief than morphine on a molecular basis. Here, we examined the analgesic/adverse effect profile of a new NT(8-13) derivative denoted JMV2009, in which the Pro10 residue was substituted by a silicon-containing unnatural amino acid silaproline. We first report the synthesis and in vitro characterization (receptor-binding affinity, functional activity and stability) of JMV2009. We next examined its analgesic activity in a battery of acute, tonic and chronic pain models. We finally evaluated its ability to induce adverse effects associated with chronic opioid use, such as constipation and analgesic tolerance or related to NTS1 activation, like hypothermia. In in vitro assays, JMV2009 exhibited high binding affinity for both NTS1 and NTS2, improved proteolytic resistance as well as agonistic activities similar to NT, inducing sustained activation of p42/p44 MAPK and receptor internalization. Intrathecal injection of JMV2009 produced dose-dependent antinociceptive responses in the tail-flick test and almost completely abolished the
Public health organizations face the problem of dispensing treatments (i.e., vaccines, antibiotics, and others) to groups of affected populations through "points-of-dispensing" (PODs) during emergency situations, typically in the presence of complexities like demand stochasticity, heterogenous utilities (e.g., for vaccine distribution, certain segments of the population may need to be prioritized), and limited storage. We formulate a hierarchical Markov decision process (MDP) model with two levels of decisions (and decision-makers): the upper-level decisions come from an inventory planner that "controls" a lower-level dynamic problem, which optimizes dispensing decisions that take into consideration the heterogeneous utility functions of the random set of PODs. We then derive structural properties of the MDP model and propose an approximate dynamic programming (ADP) algorithm that leverages structure in both the policy and the value space (state-dependent basestocks and concavity, respectively). The algorithm can be considered an actor-critic method; to our knowledge, this paper is the first to jointly exploit policy and value structure within an actor-critic framework. We prove th
The complex unfolding of the US opioid epidemic in the last 20 years has been the subject of a large body of medical and pharmacological research, and it has sparked a multidisciplinary discussion on how to implement interventions and policies to effectively control its impact on public health. This study leverages Reddit as the primary data source to investigate the opioid crisis. We aimed to find a large cohort of Reddit users interested in discussing the use of opioids, trace the temporal evolution of their interest, and extensively characterize patterns of the nonmedical consumption of opioids, with a focus on routes of administration and drug tampering. We used a semiautomatic information retrieval algorithm to identify subreddits discussing nonmedical opioid consumption, finding over 86,000 Reddit users potentially involved in firsthand opioid usage. We developed a methodology based on word embedding to select alternative colloquial and nonmedical terms referring to opioid substances, routes of administration, and drug-tampering methods. We modeled the preferences of adoption of substances and routes of administration, estimating their prevalence and temporal unfolding, obser
Understanding how best to estimate state-level policy effects is important, and several unanswered questions remain, particularly about the ability of statistical models to disentangle the effects of concurrently enacted policies. In practice, many policy evaluation studies do not attempt to control for effects of co-occurring policies, and this issue has not received extensive attention in the methodological literature to date. In this study, we utilized Monte Carlo simulations to assess the impact of co-occurring policies on the performance of commonly-used statistical models in state policy evaluations. Simulation conditions varied effect sizes of the co-occurring policies and length of time between policy enactment dates, among other factors. Outcome data (annual state-specific opioid mortality rate per 100,000) were obtained from 1999-2016 National Vital Statistics System (NVSS) Multiple Cause of Death mortality files, thus yielding longitudinal annual state-level data over 18 years from 50 states. When co-occurring policies are ignored (i.e., omitted from the analytic model), our results demonstrated that high relative bias (>85%) arises, particularly when policies are ena
Computational chemists typically assay drug candidates by virtually screening compounds against crystal structures of a protein despite the fact that some targets, like the $μ$ Opioid Receptor and other members of the GPCR family, traverse many non-crystallographic states. We discover new conformational states of $μOR$ with molecular dynamics simulation and then machine learn ligand-structure relationships to predict opioid ligand function. These artificial intelligence models identified a novel $μ$ opioid chemotype.
According to the features of drug addiction, this paper constructs an SEIR-based SUC model to describe and predict the spread of drug addiction. Predictions are that the number of drug addictions will continue to fluctuate with reduced amplitude and eventually stabilize. To seek the fountainhead of heroin, we identified the most likely origins of drugs in Philadelphia, PA, Cuyahoga and Hamilton, OH, Jefferson, KY, Kanawha, WV, and Bedford, VA. Based on the facts, advised concentration includes the spread of Oxycodone, Hydrocodone, Heroin, and Buprenorphine. In other words, drug transmission in the two states of Ohio and Pennsylvania require awareness. According to the propagation curve predicted by our model, the transfer of KY state is still in its early stage, while that of VA, WV is in the middle point, and OH, PA in its latter ones. As a result of this, the number of drug addictions in KY, OH, and VA is projected to increase in three years. For methodology, with the Principal component analysis technique, 22 variables in socio-economic data related to the continuous use of Opioid drugs was filtered, where the 'Relationship' Part deserves a highlight. Based on them, by using the
Opioid overdose rates have reached an epidemic level and state-level policy innovations have followed suit in an effort to prevent overdose deaths. State-level drug law is a set of policies that may reinforce or undermine each other, and analysts have a limited set of tools for handling the policy collinearity using statistical methods. This paper uses a machine learning method called hierarchical clustering to empirically generate "policy bundles" by grouping states with similar sets of policies in force at a given time together for analysis in a 50-state, 10-year interrupted time series regression with drug overdose deaths as the dependent variable. Policy clusters were generated from 138 binomial variables observed by state and year from the Prescription Drug Abuse Policy System. Clustering reduced the policies to a set of 10 bundles. The approach allows for ranking of the relative effect of different bundles and is a tool to recommend those most likely to succeed. This study shows that a set of policies balancing Medication Assisted Treatment, Naloxone Access, Good Samaritan Laws, Medication Assisted Treatment, Prescription Drug Monitoring Programs and legalization of medical m