Investigating the genomic background of CRISPR-Cas genomes for CRISPR-based antimicrobials
arXiv2022-02-15
CRISPR-Cas systems are an adaptive immunity that protects prokaryotes against foreign genetic elements. Genetic templates acquired during past infection events enable DNA-interacting enzymes to recognize foreign DNA for destruction. Due to the programmability and specificity of these genetic templates, CRISPR-Cas systems are potential alternative antibiotics that can be engineered to self-target antimicrobial resistance genes on the chromosome or plasmid. However, several fundamental questions remain to repurpose these tools against drug-resistant bacteria. For endogenous CRISPR-Cas self-targeting, antimicrobial resistance genes and functional CRISPR-Cas systems have to co-occur in the target cell. Furthermore, these tools have to outplay DNA repair pathways that respond to the nuclease activities of Cas proteins, even for exogenous CRISPR-Cas delivery. Here, we conduct a comprehensive survey of CRISPR-Cas genomes. First, we address the co-occurrence of CRISPR-Cas systems and antimicrobial resistance genes in the CRISPR-Cas genomes. We show that the average number of these genes varies greatly by the CRISPR-Cas type, and some CRISPR-Cas types (IE and IIIA) have over 20 genes per ge
Heterogeneous diversity of spacers within CRISPR
arXiv2010-08-16
Clustered regularly interspaced short palindromic repeats (CRISPR) in bacterial and archaeal DNA have recently been shown to be a new type of anti-viral immune system in these organisms. We here study the diversity of spacers in CRISPR under selective pressure. We propose a population dynamics model that explains the biological observation that the leader-proximal end of CRISPR is more diversified and the leader-distal end of CRISPR is more conserved. This result is shown to be in agreement with recent experiments. Our results show thatthe CRISPR spacer structure is influenced by and provides a record of the viral challenges that bacteria face.
DeepFM-Crispr: Prediction of CRISPR On-Target Effects via Deep Learning
arXiv2024-09-09
Since the advent of CRISPR-Cas9, a groundbreaking gene-editing technology that enables precise genomic modifications via a short RNA guide sequence, there has been a marked increase in the accessibility and application of this technology across various fields. The success of CRISPR-Cas9 has spurred further investment and led to the discovery of additional CRISPR systems, including CRISPR-Cas13. Distinct from Cas9, which targets DNA, Cas13 targets RNA, offering unique advantages for gene modulation. We focus on Cas13d, a variant known for its collateral activity where it non-specifically cleaves adjacent RNA molecules upon activation, a feature critical to its function. We introduce DeepFM-Crispr, a novel deep learning model developed to predict the on-target efficiency and evaluate the off-target effects of Cas13d. This model harnesses a large language model to generate comprehensive representations rich in evolutionary and structural data, thereby enhancing predictions of RNA secondary structures and overall sgRNA efficacy. A transformer-based architecture processes these inputs to produce a predictive efficacy score. Comparative experiments show that DeepFM-Crispr not only surpas
Funding CRISPR: Understanding the role of government and philanthropic institutions in supporting academic research within the CRISPR innovation system
arXiv2020-09-24
CRISPR/Cas has the potential to revolutionize medicine, agriculture, and biology. Understanding the trajectory of CRISPR research, how it is influenced and who pays for it, is an essential research policy question. We use a combination of methods to map, via quantitative content analysis of CRISPR papers, the research funding profile of major government agencies and organizations philanthropic, and the networks involved in supporting key stages of high-influence research, namely basic biological research and technological development. The results of the content analysis show how the research supported by the main US government agencies focus both on the study of CRISPR as a biological phenomenon and on its technological development and use as a biomedical research tool. US philanthropic organizations with the exception of HHMI, tend, by contrast, to specialize in funding CRISPR as a genome editing technology. We present a model of co-funding networks at the two most prominent institutions for CRISPR/Cas research, the University of California and the Harvard/MIT/Broad Institute, to illuminate how philanthropic organizations have articulated with government agencies to co-finance the
Proofreading mechanism of Class 2 CRISPR-Cas systems
arXiv2020-04-20
CRISPR systems experience off-target effects that interfere with the ability to accurately perform genetic edits. While empirical models predict off-target effects in specific platforms, there is a gap for a wide-ranging mechanistic model of CRISPR systems based on evidence from the essential processes of target recognition. In this work, we present a first principles model supported by many experiments performed in vivo and in vitro. First, we observe and model the conformational changes and discrete stepped DNA unwinding events of SpCas9-CRISPR target recognition process in a cell-free transcription-translation system using truncated guide RNAs, confirming structural and FRET microscopy experiments. Second, we implement an energy landscape to describe single mismatch effects observed in vivo for spCas9 and Cas12a CRISPR systems. From the mismatch analysis results we uncover that CRISPR class 2 systems maintain their kinetic proofreading mechanism by utilizing intermittent energetic barriers and we show how to leverage the landscape to improve target specificity.
Mathematical Modeling of CRISPR-CAS system effects on biofilm formation
arXiv2016-03-16
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR), linked with CRISPR associated (CAS) genes, play a profound role in the interactions between phage and their bacterial hosts. It is now well understood that CRISPR-CAS systems can confer adaptive immunity against bacteriophage infections. However, the possibility of failure of CRISPR immunity may lead to a productive infection by the phage (cell lysis) or lysogeny. Recently, CRISPR-CAS genes have been implicated in changes to group behaviour, including biofilm formation, of the bacterium Pseudomonas aeruginosa when lysogenized. For lysogens with a CRISPR system, another recent experimental study suggests that bacteriophage re-infection of previously lysogenized bacteria may lead to cell death. Thus CRISPR immunity can have complex effects on phage-host-lysogen interactions, particularly in a biofilm. In this contribution, we develop and analyse a series of models to elucidate and disentangle these interactions. From a therapeutic standpoint, CRISPR immunity increases biofilm resistance to phage therapy. Our models predict that lysogens may be able to displace CRISPR-immune bacteria in a biofilm, and thus suggest str
Exponential family measurement error models for single-cell CRISPR screens
arXiv2022-01-06
CRISPR genome engineering and single-cell RNA sequencing have accelerated biological discovery. Single-cell CRISPR screens unite these two technologies, linking genetic perturbations in individual cells to changes in gene expression and illuminating regulatory networks underlying diseases. Despite their promise, single-cell CRISPR screens present substantial statistical challenges. We demonstrate through theoretical and real data analyses that a standard method for estimation and inference in single-cell CRISPR screens -- "thresholded regression" -- exhibits attenuation bias and a bias-variance tradeoff as a function of an intrinsic, challenging-to-select tuning parameter. To overcome these difficulties, we introduce GLM-EIV ("GLM-based errors-in-variables"), a new method for single-cell CRISPR screen analysis. GLM-EIV extends the classical errors-in-variables model to responses and noisy predictors that are exponential family-distributed and potentially impacted by the same set of confounding variables. We develop a computational infrastructure to deploy GLM-EIV across hundreds of processors on clouds (e.g., Microsoft Azure) and high-performance clusters. Leveraging this infrastru
How the other half lives: CRISPR-Cas's influence on bacteriophages
arXiv2017-11-24
CRISPR-Cas is a genetic adaptive immune system unique to prokaryotic cells used to combat phage and plasmid threats. The host cell adapts by incorporating DNA sequences from invading phages or plasmids into its CRISPR locus as spacers. These spacers are expressed as mobile surveillance RNAs that direct CRISPR-associated (Cas) proteins to protect against subsequent attack by the same phages or plasmids. The threat from mobile genetic elements inevitably shapes the CRISPR loci of archaea and bacteria, and simultaneously the CRISPR-Cas immune system drives evolution of these invaders. Here we highlight our recent work, as well as that of others, that seeks to understand phage mechanisms of CRISPR-Cas evasion and conditions for population coexistence of phages with CRISPR-protected prokaryotes.
CRISPR: Ensemble Model
arXiv2024-03-05
Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) is a gene editing technology that has revolutionized the fields of biology and medicine. However, one of the challenges of using CRISPR is predicting the on-target efficacy and off-target sensitivity of single-guide RNAs (sgRNAs). This is because most existing methods are trained on separate datasets with different genes and cells, which limits their generalizability. In this paper, we propose a novel ensemble learning method for sgRNA design that is accurate and generalizable. Our method combines the predictions of multiple machine learning models to produce a single, more robust prediction. This approach allows us to learn from a wider range of data, which improves the generalizability of our model. We evaluated our method on a benchmark dataset of sgRNA designs and found that it outperformed existing methods in terms of both accuracy and generalizability. Our results suggest that our method can be used to design sgRNAs with high sensitivity and specificity, even for new genes or cells. This could have important implications for the clinical use of CRISPR, as it would allow researchers to design more effective and
Optimal number of spacers in CRISPR arrays
arXiv2017-05-30
We estimate the number of spacers in a CRISPR array of a bacterium which maximizes its protection against a viral attack. The optimality follows from a competition between two trends: too few distinct spacers make the bacteria vulnerable to an attack by a virus with mutated corresponding protospacers, while an excessive variety of spacers dilutes the number of the CRISPR complexes armed with the most recent and thus most effective spacers. We first evaluate the optimal number of spacers in a simple scenario of an infection by a single viral species and later consider a more general case of multiple viral species. We find that depending on such parameters as the concentration of CRISPR-CAS interference complexes and its preference to arm with more recently acquired spacers, the rate of viral mutation, and the number of viral species, the predicted optimal array length lies within a range quite reasonable from the viewpoint of recent experiments.
The independent loss model with ordered insertions for the evolution of CRISPR spacers
arXiv2017-03-01
Today, the CRISPR (clustered regularly interspaced short palindromic repeats) region within bacterial and archaeal genomes is known to encode an adaptive immune system. We rely on previous results on the evolution of the CRISPR arrays, which led to the ordered independent loss model, introduced by Kupczok and Bollback (2013). When focusing on the spacers (between the repeats), new elements enter a CRISPR array at rate $θ$ at the leader end of the array, while all spacers present are lost at rate $ρ$ along the phylogeny relating the sample. Within this model, we compute the distribution of distances of spacers which are present in all arrays in samples of size $n=2$ and $n=3$. We use these results to estimate the loss rate $ρ$ from spacer array data.
CRISPR-GPT for Agentic Automation of Gene-editing Experiments
arXiv2024-04-27
The introduction of genome engineering technology has transformed biomedical research, making it possible to make precise changes to genetic information. However, creating an efficient gene-editing system requires a deep understanding of CRISPR technology, and the complex experimental systems under investigation. While Large Language Models (LLMs) have shown promise in various tasks, they often lack specific knowledge and struggle to accurately solve biological design problems. In this work, we introduce CRISPR-GPT, an LLM agent augmented with domain knowledge and external tools to automate and enhance the design process of CRISPR-based gene-editing experiments. CRISPR-GPT leverages the reasoning ability of LLMs to facilitate the process of selecting CRISPR systems, designing guide RNAs, recommending cellular delivery methods, drafting protocols, and designing validation experiments to confirm editing outcomes. We showcase the potential of CRISPR-GPT for assisting non-expert researchers with gene-editing experiments from scratch and validate the agent's effectiveness in a real-world use case. Furthermore, we explore the ethical and regulatory considerations associated with automate
Nonclassical phase diagram for virus bacterial co-evolution mediated by CRISPR
arXiv2017-08-16
CRISPR is a newly discovered prokaryotic immune system. Bacteria and archaea with this system incorporate genetic material from invading viruses into their genomes, providing protection against future infection by similar viruses. The conditions for coexistence of prokaryots and viruses is an interesting problem in evolutionary biology. In this work, we show an intriguing phase diagram of the virus extinction probability, which is more complex than that of the classical predator-prey model. As the CRISPR incorporates genetic material, viruses are under pressure to evolve to escape the recognition by CRISPR. When bacteria have a small rate of deleting spacers, a new parameter region in which bacteria and viruses can coexist arises, and it leads to a more complex coexistence patten for bacteria and viruses. For example, when the virus mutation rate is low, the virus extinction probability changes non-montonically with the bacterial exposure rate. The virus and bacteria co-evolution not only alters the virus extinction probability, but also changes the bacterial population structure. Additionally, we show that recombination is a successful strategy for viruses to escape from CRISPR re
Guide-Guard: Off-Target Predicting in CRISPR Applications
arXiv2026-02-18
With the introduction of cyber-physical genome sequencing and editing technologies, such as CRISPR, researchers can more easily access tools to investigate and create remedies for a variety of topics in genetics and health science (e.g. agriculture and medicine). As the field advances and grows, new concerns present themselves in the ability to predict the off-target behavior. In this work, we explore the underlying biological and chemical model from a data driven perspective. Additionally, we present a machine learning based solution named \textit{Guide-Guard} to predict the behavior of the system given a gRNA in the CRISPR gene-editing process with 84\% accuracy. This solution is able to be trained on multiple different genes at the same time while retaining accuracy.
pgMAP: a pipeline to enable guide RNA read mapping from dual-targeting CRISPR screens
arXiv2023-06-01
We developed pgMAP, an analysis pipeline to map gRNA sequencing reads from dual-targeting CRISPR screens. pgMAP output includes a dual gRNA read counts table and quality control metrics including the proportion of correctly-paired reads and CRISPR library sequencing coverage across all time points and samples. pgMAP is implemented using Snakemake and is available open-source under the MIT license at https://github.com/fredhutch/pgmap_pipeline.
Physical Model of the Immune Response of Bacteria Against Bacteriophage Through the Adaptive CRISPR-Cas Immune System
arXiv2014-01-20
Bacteria and archaea have evolved an adaptive, heritable immune system that recognizes and protects against viruses or plasmids. This system, known as the CRISPR-Cas system, allows the host to recognize and incorporate short foreign DNA or RNA sequences, called `spacers' into its CRISPR system. Spacers in the CRISPR system provide a record of the history of bacteria and phage coevolution. We use a physical model to study the dynamics of this coevolution as it evolves stochastically over time. We focus on the impact of mutation and recombination on bacteria and phage evolution and evasion. We discuss the effect of different spacer deletion mechanisms on the coevolutionary dynamics. We make predictions about bacteria and phage population growth, spacer diversity within the CRISPR locus, and spacer protection against the phage population.
The physicist's guide to one of biotechnology's hottest new topics: CRISPR-Cas
arXiv2017-12-28
Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated proteins (Cas) constitute a multi-functional, constantly evolving immune system in bacteria and archaea cells. A heritable, molecular memory is generated of phage, plasmids, or other mobile genetic elements that attempt to attack the cell. This memory is used to recognize and interfere with subsequent invasions from the same genetic elements. This versatile prokaryotic tool has also been used to advance applications in biotechnology. Here we review a large body of CRISPR-Cas research to explore themes of evolution and selection, population dynamics, horizontal gene transfer, specific and cross-reactive interactions, cost and regulation, non-immunological CRISPR functions that boost host cell robustness, as well as applicable mechanisms for efficient and specific genetic engineering. We offer future directions that can be addressed by the physics community. Physical understanding of the CRISPR-Cas system will advance uses in biotechnology, such as developing cell lines and animal models, cell labeling and information storage, combatting antibiotic resistance, and human therapeutics.
A small PAM optimises target recognition in the CRISPR-Cas immune system
arXiv2021-01-05
CRISPR-Cas is an adaptive immune mechanism that has been harnessed for a variety of genetic engineering applications: the Cas9 protein recognises a 2-5nt DNA motif, known as the PAM, and a programmable crRNA binds a target DNA sequence that is then cleaved. While off-target activity is undesirable, it occurs because cross-reactivity was beneficial in the immune system on which the machinery is based. Here, a stochastic model of the target recognition reaction was derived to study the specificity of the innate immune mechanism in bacteria. CRISPR systems with Cas9 proteins that recognised PAMs of varying lengths were tested on self and phage DNA. The model showed that the energy associated with PAM binding impacted mismatch tolerance, cleavage probability, and cleavage time. Small PAMs allowed the CRISPR to balance catching mutant phages, avoiding self-targeting, and quickly dissociating from critically non-matching sequences. Additionally, the results revealed a lower tolerance to mismatches in the PAM and a PAM-proximal region known as the seed, as seen in experiments. This work illustrates the role that the Cas9 protein has in dictating the specificity of DNA cleavage that can ai
Artificial Intelligence for CRISPR Guide RNA Design: Explainable Models and Off-Target Safety
arXiv2025-08-26
CRISPR-based genome editing has revolutionized biotechnology, yet optimizing guide RNA (gRNA) design for efficiency and safety remains a critical challenge. Recent advances (2020--2025, updated to reflect current year if needed) demonstrate that artificial intelligence (AI), especially deep learning, can markedly improve the prediction of gRNA on-target activity and identify off-target risks. In parallel, emerging explainable AI (XAI) techniques are beginning to illuminate the black-box nature of these models, offering insights into sequence features and genomic contexts that drive Cas enzyme performance. Here we review how state-of-the-art machine learning models are enhancing gRNA design for CRISPR systems, highlight strategies for interpreting model predictions, and discuss new developments in off-target prediction and safety assessment. We emphasize breakthroughs from top-tier journals that underscore an interdisciplinary convergence of AI and genome editing to enable more efficient, specific, and clinically viable CRISPR applications.
CRISPR SWAPnDROP -- A multifunctional system for genome editing and large-scale interspecies gene transfer
arXiv2021-11-23
The need for diverse chromosomal modifications in biotechnology, synthetic biology and basic research requires the development of new technologies. With CRISPR SWAPnDROP, we extend the limits of genome editing to large-scale in-vivo DNA transfer between bacterial species. Its modular platform approach facilitates species specific adaptation to confer genome editing in various species. In this study, we show the implementation of the CRISPR SWAPnDROP concept for the model organism Escherichia coli and the currently fastest growing and biotechnologically relevant organism Vibrio natriegens. We demonstrate the excision, transfer and integration of 151kb chromosomal DNA between E. coli strains and from E. coli to V. natriegens without size-limiting intermediate DNA extraction. With the transfer of the E. coli MG1655 wild type lac operon, we establish a functional lactose and galactose degradation pathway in V. natriegens to extend its biotechnological spectrum. We also transfer the E. coli DH5alpha lac operon and make V. natriegens capable of alpha-complementation - a step towards an ultra-fast cloning strain. Furthermore, CRISPR SWAPnDROP is designed to be the swiss army knife of geno