Antimicrobial resistance (AMR) is escalating and outpacing current antibiotic development. Thus, discovering antibiotics effective against emerging pathogens is becoming increasingly critical. However, existing approaches cannot rapidly identify effective molecules against novel pathogens or emerging drug-resistant strains. Here, we introduce ApexOracle, an artificial intelligence (AI) model that both predicts the antibacterial potency of existing compounds and designs de novo molecules active against strains it has never encountered. Departing from models that rely solely on molecular features, ApexOracle incorporates pathogen-specific context through the integration of molecular features captured via a foundational discrete diffusion language model and a dual-embedding framework that combines genomic- and literature-derived strain representations. Across diverse bacterial species and chemical modalities, ApexOracle consistently outperformed state-of-the-art approaches in activity prediction and demonstrated reliable transferability to novel pathogens with little or no antimicrobial data. Its unified representation-generation architecture further enables the in silico creation of
Epidemic spreading over populations networks has been an important subject of research for several decades, and especially during the Covid-19 pandemic. Most epidemic outbreaks are likely to create multiple mutations during their spreading over the population. In this paper, we study the evolution of a pathogen which can mutate continuously during the epidemic spreading. We consider pathogens whose mutating parameter is the mortality mean-time, and study the evolution of this parameter over the spreading process. We use analytical methods to compute the dynamic equation of the epidemic and the conditions for it to spread. We also use numerical simulations to study the pathogen flow in this case, and to understand the mutation phenomena. We show that the natural selection leads to less violent pathogens becoming predominant in the population. We discuss a wide range of network structures and show how different effects are manifested in each case. We also applied our theory in the context of the Covid-19 pandemic, using relevant epidemiological data collected for this outbreak. We provided explanations for the variants spreading processes observed throughout this pandemic.
This study aims to evaluate the accuracy of authorship attributions in scientific publications, focusing on the fairness and precision of individual contributions within academic works. The study analyzes 81,823 publications from the journal PLOS ONE, covering the period from January 2018 to June 2023. It examines the authorship attributions within these publications to try and determine the prevalence of inappropriate authorship. It also investigates the demographic and professional profiles of affected authors, exploring trends and potential factors contributing to inaccuracies in authorship. Surprisingly, 9.14% of articles feature at least one author with inappropriate authorship, affecting over 14,000 individuals (2.56% of the sample). Inappropriate authorship is more concentrated in Asia, Africa, and specific European countries like Italy. Established researchers with significant publication records and those affiliated with companies or nonprofits show higher instances of potential monetary authorship. Our findings are based on contributions as declared by the authors, which implies a degree of trust in their transparency. However, this reliance on self-reporting may introduc
We analyse a model that describes the propagation of many pathogens within and between many species. A branching process approximation is used to compute the probability of disease outbreaks. Special cases of aquatic environments with two host species and one or two pathogens are considered both analytically and computationally.
A persistent public health challenge is finding immunization schemes that are effective in combating highly mutable pathogens such as HIV and influenza viruses. To address this, we analyze a simplified model of affinity maturation, the Darwinian evolutionary process B cells undergo during immunization. The vaccination protocol dictates selection forces that steer affinity maturation to generate antibodies. We focus on determining the optimal selection forces exerted by a generic time-dependent vaccination protocol to maximize production of broadly neutralizing antibodies (bnAbs) that can protect against a broad spectrum of pathogen strains. The model lends itself to a path integral representation and operator approximations within a mean-field limit, providing guiding principles for optimizing time-dependent vaccine-induced selection forces to enhance bnAb generation. We compare our analytical mean-field results with the outcomes of stochastic simulations and discuss their similarities and differences.
Contributorship statements have been effective at recording granular author contributions in research articles and have been broadly used to understand how labor is divided across research teams. However, one major limitation in existing empirical studies is that two classification systems have been adopted, especially from its most important data source, journals published by the Public Library of Science (PLoS). This research aims to address this limitation by developing a mapping scheme between the two systems and using it to understand whether there are differences in the assignment of contribution by authors under the two systems. We use all research articles published in PLoS ONE between 2012 to 2020, divided into two five-year publication windows centered by the shift of the classification systems in 2016. Our results show that most tasks (except for writing- and resource-related tasks) are used similarly under the two systems. Moreover, notable differences between how researchers used the two systems are also examined and discussed. This research offers an important foundation for empirical research on division of labor in the future, by enabling a larger dataset that cross
As sequencing technologies become more affordable and genomic databases expand continuously, the reuse of publicly available sequencing data emerges as a powerful strategy for studying microbial pathogens. Indeed, raw sequencing reads generated for the study of a given organism often contain reads originating from the associated microbiota. This review explores how such off-target reads can be detected and used for the study of microbial pathogens. We present genomic data mining as a method to identify relevant sequencing runs from petabase-scale databases, highlighting recent methodological advances that allow efficient database querying. We then briefly outline methods designed to retrieve relevant data and associated metadata, and provide an overview of common downstream analysis pipelines. We discuss how such approaches have (i) expanded the known genetic diversity of microbial pathogens, (ii) enriched our understanding of their spatiotemporal distribution, and (iii) highlighted previously unrecognized ecological interactions involving microbial pathogens. However, these analyses often rely on the completeness and accuracy of accompanying metadata, which remain highly variable.
Host-pathogen interactions consist of an attack by the pathogen, frequently a defense by the host and possibly a counter-defense by the pathogen. Here, we present a game-theoretical approach to describing such interactions. We consider a game where the host and pathogen are players and they can choose between the strategies of defense (or counter-defense) and no response. Specifically, they may or may not produce a toxin and an enzyme degrading the toxin, respectively. We consider that the host and pathogen must also incur a cost for toxin or enzyme production. We highlight both the sequential and non-sequential versions of the game and determine the Nash equilibria. Further, we resolve a paradox occurring in that interplay. If the inactivating enzyme is very efficient, producing the toxin becomes useless, leading to the enzyme being no longer required. Then, production of the defense becomes useful again. In game theory, such situations can be described by a generalized matching pennies game. As a novel result, we find under which conditions the defense cycle leads to a steady state or to an oscillation. We obtain, for saturating dose-response kinetics and considering monotonic co
This study proposes a quantitative framework to enhance curriculum coherence through the systematic alignment of Course Learning Outcomes (CLOs) and Program Learning Outcomes (PLOs), contributing to continuous improvement in outcome-based education. Grounded in accreditation standards such as ABET and NCAAA, the model introduces mathematical tools that map exercises, assessment questions, teaching units (TUs), and student assessment components (SACs) to CLOs and PLOs. This dual-layer approach-combining micro-level analysis of assessment elements with macro-level curriculum evaluation-enables detailed tracking of learning outcomes and helps identify misalignments between instructional delivery, assessment strategies, and program objectives. The framework incorporates alignment matrices, weighted relationships, and practical indicators to quantify coherence and evaluate course or program performance. Application of this model reveals gaps in outcome coverage and underscores the importance of realignment, especially when specific PLOs are underrepresented or CLOs are not adequately supported by assessments. The proposed model is practical, adaptable, and scalable, making it suitable f
Cooperation and competition between pathogens can alter the amount of individuals affected by a co-infection. Nonetheless, the evolution of the pathogens' behavior has been overlooked. Here, we consider a co-evolutionary model where the simultaneous spreading is described by a two-pathogen susceptible-infected-recovered model in an either synergistic or competitive manner. At the end of each epidemic season, the pathogens species reproduce according to their fitness that, in turn, depends on the payoff accumulated during the spreading season in a hawk-and-dove game. This co-evolutionary model displays a rich set of features. Specifically, the evolution of the pathogens' strategy induces abrupt transitions in the epidemic prevalence. Furthermore, we observe that the long-term dynamics results in a single, surviving pathogen species, and that the cooperative behavior of pathogens can emerge even under unfavorable conditions.
During the recent pandemic, a rise in COVID-19 cases was followed by a decline in influenza. In the absence of cross-immunity, a potential explanation for the observed pattern is behavioral: non-pharmaceutical interventions (NPIs) designed and promoted for one disease also reduce the spread of others. We study short-term and long-term dynamics of two pathogens where NPIs targeting one pathogen indirectly influence the spread of another - a phenomenon we term behavioral spillover. We examine how perceived risk of and response to one disease substantially alters the spread of other pathogens, revealing how waves of different pathogens emerge over time as a result of behavioral interdependencies and human response. Our analysis identifies the parameter space where two diseases simultaneously co-exist, and where shifts in prevalence occur. Our findings are consistent with observations from the COVID-19 pandemic, where NPIs contributed to significant declines in infections such as influenza, pneumonia, and Lyme disease.
To optimize strategies for curbing the transmission of airborne pathogens, the efficacy of three key controls -- face masks, ventilation, and physical distancing -- must be well understood. In this study we used the Quadrature-based model of Respiratory Aerosol and Droplets to quantify the reduction in exposure to airborne pathogens from various combinations of controls. For each combination of controls, we simulated thousands of scenarios that represent the tremendous variability in factors governing airborne transmission and the efficacy of mitigation strategies. While the efficacy of any individual control was highly variable among scenarios, combining universal mask-wearing with distancing of 1~m or more reduced the median exposure by more than 99\% relative to a close, unmasked conversation, with further reductions if ventilation is also enhanced. The large reductions in exposure to airborne pathogens translated to large reductions in the risk of initial infection in a new host. These findings suggest that layering controls is highly effective for reducing transmission of airborne pathogens and will be critical for curbing outbreaks of novel viruses in the future.
Despite being similar in structure, functioning, and size viral pathogens enjoy very different mostly well-defined ways of life. They occupy their hosts for a few days (influenza), for a few weeks (measles), or even lifelong (HCV), which manifests in acute or chronic infections. The various transmission routes (airborne, via direct contact, etc.), degrees of infectiousness (referring to the load required for transmission), antigenic variation/immune escape and virulence define further pathogenic lifestyles. To survive pathogens must infect new hosts; the success determines their fitness. Infection happens with a certain likelihood during contact of hosts, where contact can also be mediated by vectors. Besides structural aspects of the host-contact network, three parameters/concepts appear to be key: the contact rate and the infectiousness during contact, which encode the mode of transmission, and third the immunity of susceptible hosts. From here, what can be concluded about the evolutionary strategies of viral pathogens? This is the biological question addressed in this paper. The answer extends earlier results (Lange & Ferguson 2009, PLoS Comput Biol 5 (10): e1000536) and mak
Pathogen identification is pivotal in diagnosing, treating, and preventing diseases, crucial for controlling infections and safeguarding public health. Traditional alignment-based methods, though widely used, are computationally intense and reliant on extensive reference databases, often failing to detect novel pathogens due to their low sensitivity and specificity. Similarly, conventional machine learning techniques, while promising, require large annotated datasets and extensive feature engineering and are prone to overfitting. Addressing these challenges, we introduce PathoLM, a cutting-edge pathogen language model optimized for the identification of pathogenicity in bacterial and viral sequences. Leveraging the strengths of pre-trained DNA models such as the Nucleotide Transformer, PathoLM requires minimal data for fine-tuning, thereby enhancing pathogen detection capabilities. It effectively captures a broader genomic context, significantly improving the identification of novel and divergent pathogens. We developed a comprehensive data set comprising approximately 30 species of viruses and bacteria, including ESKAPEE pathogens, seven notably virulent bacterial strains resistan
The goal of this study is to develop a computational model of the progression of changes in mitochondrial phenotype resulting from infection with pathogenic mycobacteria. This ultimately will enable a large-scale virulence screen of mutant bacterial libraries. Mycobacterium tuberculosis (Mtb) is an intracellular pathogen, but only a small number of its genes have been studied for roles in intracellular host cell survival and replication. Mitochondria are the powerhouse of the host cell and play critical roles in cell survival when attacked by certain pathogens. When Mtb bacteria invade host cells, they induce changes in mitochondrial morphology, making mitochondria a novel target for image processing and machine learning to determine virulence associations of genes in Mtb and potentially other related intracellular pathogens. By hypothesizing mitochondria as an instance of a dynamic and interconnected graph, we demonstrate a statistical approach for quantitatively recognizing novel mitochondrial phenotypes induced by invading pathogens.
DNA, encoding genetic instructions for almost all living organisms, fuels groundbreaking advances in genomics and synthetic biology. Recently, DNA Foundation Models have achieved success in designing synthetic functional DNA sequences, even whole genomes, but their susceptibility to jailbreaking remains underexplored, leading to potential concern of generating harmful sequences such as pathogens or toxin-producing genes. In this paper, we introduce GeneBreaker, the first framework to systematically evaluate jailbreak vulnerabilities of DNA foundation models. GeneBreaker employs (1) an LLM agent with customized bioinformatic tools to design high-homology, non-pathogenic jailbreaking prompts, (2) beam search guided by PathoLM and log-probability heuristics to steer generation toward pathogen-like sequences, and (3) a BLAST-based evaluation pipeline against a curated Human Pathogen Database (JailbreakDNABench) to detect successful jailbreaks. Evaluated on our JailbreakDNABench, GeneBreaker successfully jailbreaks the latest Evo series models across 6 viral categories consistently (up to 60\% Attack Success Rate for Evo2-40B). Further case studies on SARS-CoV-2 spike protein and HIV-1
This study aims to screen the antibacterial activity of lactic acid bacteria (LAB) isolated from home-made fermented vegetables against common food borne pathogens. The antagonistic properties of these isolates against Escherichia coli, Staphylococcus aureus, Yersinia enterocolitica and Bacillus cereus were examined using agar well diffusion method. Four LAB namely MF6, MF10, MF13, and MF15 identified as Lactobacillus animalis, Lactobacillus rhamnosus, Lactobacillus fermentum and Lactobacillus reuteri, respectively were effective against all selected pathogenic strains. Amongst the four isolates, MF6 exhibited the highest antibacterial activity, against all the indicator pathogens tested except Y. enterocolitic. Its activity was maximum against E.coli with a Zone of Inhibition (ZOI) ranging from 18.7 to 21.3 mm and least for Y. enterocolitica (10 \pm 1.1 mm). Isolate MF13 also showed antimicrobial property against all tested pathogens showing highest activity against Y. enterocolitica (14 \pm 1.7 mm) and least against E.coli (8 \pm 1.4 mm), which was in direct contrast to isolate MF6. Isolate MF15 showed greater activity against E.coli (12 \pm 0.8 mm) and least against S. aureus (8
Human pathogens transmitted through environmental pathways are subject to stress and pressures outside of the host. These pressures may cause pathogen pathovars to diverge in their environmental persistence and their infectivity on an evolutionary time-scale. On a shorter time-scale, a single-genotype pathogen population may display wide variation in persistence times and exhibit biphasic decay. Using an infectious disease transmission modeling framework, we demonstrate in both cases that fitness-preserving trade-offs have implications for the dynamics of associated epidemics: less infectious, more persistent pathogens cause epidemics to progress more slowly than more infectious, less persistent (labile) pathogens, even when the overall risk is the same. Using identifiability analysis, we show that the usual disease surveillance data does not sufficiently inform these underlying pathogen population dynamics, even with basic environmental monitoring. These results suggest directions for future microbial research and environmental monitoring. In particular, determining the relative infectivity of persistent pathogen subpopulations and the rates of phenotypic conversion will help asce
PLOS and Mozilla conducted a month-long pilot study in which professional developers performed code reviews on software associated with papers published in PLOS Computational Biology. While the developers felt the reviews were limited by (a) lack of familiarity with the domain and (b) lack of two-way contact with authors, the scientists appreciated the reviews, and both sides were enthusiastic about repeating the experiment.
We analyzed the longitudinal activity of nearly 7,000 editors at the mega-journal PLOS ONE over the 10-year period 2006-2015. Using the article-editor associations, we develop editor-specific measures of power, activity, article acceptance time, citation impact, and editorial renumeration (an analogue to self-citation). We observe remarkably high levels of power inequality among the PLOS ONE editors, with the top-10 editors responsible for 3,366 articles -- corresponding to 2.4% of the 141,986 articles we analyzed. Such high inequality levels suggest the presence of unintended incentives, which may reinforce unethical behavior in the form of decision-level biases at the editorial level. Our results indicate that editors may become apathetic in judging the quality of articles and susceptible to modes of power-driven misconduct. We used the longitudinal dimension of editor activity to develop two panel regression models which test and verify the presence of editor-level bias. In the first model we analyzed the citation impact of articles, and in the second model we modeled the decision time between an article being submitted and ultimately accepted by the editor. We focused on two va