To build effective therapeutics, biologists iteratively mutate antibody sequences to improve binding and stability. Proposed mutations can be informed by previous measurements or by learning from large antibody databases to predict only typical antibodies. Unfortunately, the space of typical antibodies is enormous to search, and experiments often fail to find suitable antibodies on a budget. We introduce Clone-informed Bayesian Optimization (CloneBO), a Bayesian optimization procedure that efficiently optimizes antibodies in the lab by teaching a generative model how our immune system optimizes antibodies. Our immune system makes antibodies by iteratively evolving specific portions of their sequences to bind their target strongly and stably, resulting in a set of related, evolving sequences known as a clonal family. We train a large language model, CloneLM, on hundreds of thousands of clonal families and use it to design sequences with mutations that are most likely to optimize an antibody within the human immune system. We propose to guide our designs to fit previous measurements with a twisted sequential Monte Carlo procedure. We show that CloneBO optimizes antibodies substantial
Affinity maturation is the Darwinian process by which antibodies improve antigen binding through somatic hypermutation and selection. The adaptive landscape, which defines the set of antibody-specific mutations that improve functional characteristics like antigen binding, has been explored in only a handful of antibodies. Identifying the sites of adaptive mutations in a given antibody sequence, and how these sites vary across the antibody repertoire, can inform the design of therapeutic antibodies. We develop a parameter-free population genetic framework that leverages the statistics of convergent affinity maturation in B cell lineages sharing similar naive sequences, called public clonotypes, to identify beneficial mutations. Applying this framework to more than 10,000 public clonotypes represented by multiple lineages across 20 healthy individuals, we identify widespread signatures of clonotype-dependent selection of individual mutations. We estimate the prevalence and typical fitness effects of mutations across the V gene at the single-site level, uncovering a general tradeoff between prevalence and fitness effect. These inferred landscapes broadly reproduce the statistics of co
The complementarity-determining regions of antibodies are loop structures that are key to their interactions with antigens, and of high importance to the design of novel biologics. Since the 1980s, categorizing the diversity of CDR structures into canonical clusters has enabled the identification of key structural motifs of antibodies. However, existing approaches have limited coverage and cannot be readily incorporated into protein foundation models. Here we introduce ImmunoGlobulin LOOp Tokenizer, Igloo, a multimodal antibody loop tokenizer that encodes backbone dihedral angles and sequence. Igloo is trained using a contrastive learning objective to map loops with similar backbone dihedral angles closer together in latent space. Igloo can efficiently retrieve the closest matching loop structures from a structural antibody database, outperforming existing methods on identifying similar H3 loops by 5.9\%. Igloo assigns tokens to all loops, addressing the limited coverage issue of canonical clusters, while retaining the ability to recover canonical loop conformations. To demonstrate the versatility of Igloo tokens, we show that they can be incorporated into protein language models w
Therapeutic antibodies require not only high-affinity target engagement, but also favorable manufacturability, stability, and safety profiles for clinical effectiveness. These properties are collectively called `developability'. To enable a computational framework for optimizing antibody sequences for favorable developability, we introduce a guided discrete diffusion model trained on natural paired heavy- and light-chain sequences from the Observed Antibody Space (OAS) and quantitative developability measurements for 246 clinical-stage antibodies. To steer generation toward biophysically viable candidates, we integrate a Soft Value-based Decoding in Diffusion (SVDD) Module that biases sampling without compromising naturalness. In unconstrained sampling, our model reproduces global features of both the natural repertoire and approved therapeutics, and under SVDD guidance we achieve significant enrichment in predicted developability scores over unguided baselines. When combined with high-throughput developability assays, this framework enables an iterative, ML-driven pipeline for designing antibodies that satisfy binding and biophysical criteria in tandem.
Since the approval of the first antibody drug in 1986, a total of 162 antibodies have been approved for a wide range of therapeutic areas, including cancer, autoimmune, infectious, or cardiovascular diseases. Despite advances in biotechnology that accelerated the development of antibody drugs, the drug discovery process for this modality remains lengthy and costly, requiring multiple rounds of optimizations before a drug candidate can progress to preclinical and clinical trials. This multi-optimization problem involves increasing the affinity of the antibody to the target antigen while refining additional biophysical properties that are essential to drug development such as solubility, thermostability or aggregation propensity. Additionally, antibodies that resemble natural human antibodies are particularly desirable, as they are likely to offer improved profiles in terms of safety, efficacy, and reduced immunogenicity, further supporting their therapeutic potential. In this article, we explore the use of energy-based generative models to optimize a candidate monoclonal antibody. We identify tradeoffs when optimizing for multiple properties, concentrating on solubility, humanness a
Antibodies recognizing complexes of the chemokine platelet factor 4 (PF4-CXCL4) and polyanions (P) opsonize PF4-coated bacteria hereby mediating bacterial host defense. A subset of these antibodies may activate platelets after binding to PF4-heparin complexes, causing the prothrombotic adverse drug reaction heparin-induced thrombocytopenia (HIT). In autoimmune-HIT, anti-PF4-P-antibodies activate platelets in the absence of heparin. Here we show that antibodies with binding forces of approximately 60-100 pN activate platelets in the presence of polyanions, while a subset of antibodies from autoimmune-HIT patients with binding forces greater than 100 pN binds to PF4 alone in the absence of polyanions. These antibodies with high binding forces cluster PF4-molecules forming antigenic complexes which allow binding of polyanion-dependent anti-PF4-P-antibodies. The resulting immunocomplexes induce massive platelet activation in the absence of heparin. Antibody-mediated changes in endogenous proteins that trigger binding of otherwise non-pathogenic (or cofactor-dependent) antibodies may also be relevant in other antibody-mediated autoimmune disorders.
This study presents theoretical results for physico-chemical properties of system of molecules modeling bi-specific antibodies, such as, dual-variable-domain monoclonal antibodies (DVD-Ig) and Fabs-In-Tandem Immunoglobulin (FIT-Ig). These molecules are representatives of the engineered proteins that combine the function and specificity of two monoclonal antibodies. Individual molecules are here depicted as an assembly of nine (or in case of the Fit-Ig eleven) hard spheres, organized to resemble the Y-shaped object. The effects of the increased size, asymmetry, and flexibility of individual molecules on measurable properties of such systems of molecules are investigated. We examined the liquid-liquid phase separation, the second virial coefficient $B_2$, and viscosity under various experimental conditions. The calculations are compared with the data for regular monoclonal antibodies and discussed in view of the experimental results for DVD-Ig solutions available in literature.
We present a three-stage framework for training deep learning models specializing in antibody sequence-structure co-design. We first pre-train a language model using millions of antibody sequence data. Then, we employ the learned representations to guide the training of a diffusion model for joint optimization over both sequence and structure of antibodies. During the final alignment stage, we optimize the model to favor antibodies with low repulsion and high attraction to the antigen binding site, enhancing the rationality and functionality of the designs. To mitigate conflicting energy preferences, we extend AbDPO (Antibody Direct Preference Optimization) to guide the model toward Pareto optimality under multiple energy-based alignment objectives. Furthermore, we adopt an iterative learning paradigm with temperature scaling, enabling the model to benefit from diverse online datasets without requiring additional data. In practice, our proposed methods achieve high stability and efficiency in producing a better Pareto front of antibody designs compared to top samples generated by baselines and previous alignment techniques. Through extensive experiments, we showcase the superior pe
The identification and validation of therapeutic antibodies is critical for developing effective treatments for many diseases. We present a computational approach for identifying antibodies targeting GFRAL-specific receptors, receptors implicated in appetite regulation. Using humanized Trianni mice, we conducted a longitudinal study with repeated blood sampling and splenic analysis. We applied the STAR computational method for antibody discovery on bulk antibody repertoire data sampled at key time points. By mapping the output from STAR to single-cell data taken at the last time point, we successfully identified a pool of antibodies, of which 50% demonstrated binding capabilities. We observed convergent selection, where responding sequences with identical amino acid complementarity determining regions 3 (CDR3) were found in different mice. We provide a catalog of 67 experimentally validated antibodies against GFRAL. The potential of these antibodies as antagonists or agonists against GFRAL suggests therapeutic solutions for conditions like cancer cachexia, anorexia, obesity, and diabetes. This study underscores the utility of integrating computational methods and experimental valid
Monoclonal antibodies (mAbs) represent one of the most prevalent FDA-approved modalities for treating autoimmune diseases, infectious diseases, and cancers. However, discovery and development of therapeutic antibodies remains a time-consuming and expensive process. Recent advancements in machine learning (ML) and artificial intelligence (AI) have shown significant promise in revolutionizing antibody discovery and optimization. In particular, models that predict antibody biological activity enable in-silico evaluation of binding and functional properties; such models can prioritize antibodies with the highest likelihoods of success in costly and time-intensive laboratory testing procedures. We here explore an AI model for predicting the binding and receptor blocking activity of antibodies against influenza A hemagglutinin (HA) antigens. Our present model is developed with the MAMMAL framework for biologics discovery to predict antibody-antigen interactions using only sequence information. To evaluate the model's performance, we tested it under various data split conditions to mimic real-world scenarios. Our models achieved an AUROC $\geq$ 0.91 for predicting the activity of existing
The development of therapeutic antibodies heavily relies on accurate predictions of how antigens will interact with antibodies. Existing computational methods in antibody design often overlook crucial conformational changes that antigens undergo during the binding process, significantly impacting the reliability of the resulting antibodies. To bridge this gap, we introduce dyAb, a flexible framework that incorporates AlphaFold2-driven predictions to model pre-binding antigen structures and specifically addresses the dynamic nature of antigen conformation changes. Our dyAb model leverages a unique combination of coarse-grained interface alignment and fine-grained flow matching techniques to simulate the interaction dynamics and structural evolution of the antigen-antibody complex, providing a realistic representation of the binding process. Extensive experiments show that dyAb significantly outperforms existing models in antibody design involving changing antigen conformations. These results highlight dyAb's potential to streamline the design process for therapeutic antibodies, promising more efficient development cycles and improved outcomes in clinical applications.
Antibody therapies have been employed to address some of today's most challenging diseases, but must meet many criteria during drug development before reaching a patient. Humanization is a sequence optimization strategy that addresses one critical risk called immunogenicity - a patient's immune response to the drug - by making an antibody more "human-like" in the absence of a predictive lab-based test for immunogenicity. However, existing humanization strategies generally yield very few humanized candidates, which may have degraded biophysical properties or decreased drug efficacy. Here, we re-frame humanization as a conditional generative modeling task, where humanizing mutations are sampled from a language model trained on human antibody data. We describe a sampling process that incorporates models of therapeutic attributes, such as antigen binding affinity, to obtain candidate sequences that have both reduced immunogenicity risk and maintained or improved therapeutic properties, allowing this algorithm to be readily embedded into an iterative antibody optimization campaign. We demonstrate in silico and in lab validation that in real therapeutic programs our generative humanizati
We review standard mediation assumptions as they apply to identifying antibody effects in a randomized vaccine trial and propose new study designs to allow identification of an estimand that was previously unidentifiable. For these mediation analyses, we partition the total ratio effect (one minus the vaccine effect) from a randomized vaccine trial into indirect (effects through antibodies) and direct effects (other effects). Identifying $λ$, the proportion of the total effect due to an indirect effect, depends on a cross-world quantity, the potential outcome among vaccinated individuals with antibody levels as if given placebo, or vice versa. We review assumptions for identifying $λ$ and show that there are two versions of $λ$, unless the effect of adding antibodies to the placebo arm is equal in magnitude to the effect of subtracting antibodies from the vaccine arm. We focus on the case when individuals in the placebo arm are unlikely to have the needed antibodies. In that case, if a standard assumption (given confounders, potential mediators and potential outcomes are independent) is true, only one version of $λ$ is identifiable, and if not neither is identifiable. We propose al
Antibody binding properties for immunostaining applications are often characterized by dose-response curves, which describe the amount of bound antibodies as a function of the antibody concentration applied at the beginning of the experiment. A common model for the dose-response curve is the Langmuir isotherm, which assumes an equilibrium between the binding and unbinding of antibodies. However, for common immunostaining protocols, the equilibrium assumption is violated, and the dose-response behavior is governed by an accumulation of permanently bound antibodies. Assuming a constant antibody concentration, the resulting accumulation model can easily be solved analytically. However, in many experimental setups the overall amount of antibodies is fixed, such that antibody binding reduces the concentration of free antibodies. Solving the corresponding depletion accumulation model is more difficult and seems to be impossible for heterogeneous epitope landscapes. In this paper, we first solve the depletion-free accumulation model analytically for a homogeneous epitope landscape. From the obtained solution, we derive inequalities between the depletion-free accumulation model, the deplet
We study a simplified model of monoclonal antibodies confined in a patchy random porous medium. Antibodies are represented as Y-shaped particles composed of seven tangential hard spheres with attractive patches on the terminal beads, while the matrix consists of randomly distributed hard-sphere obstacles bearing adhesive sites. The model captures antibody behavior in crowded biological environments with strong short-range antibody-matrix attractions. The theoretical approach combines Wertheim's multidensity thermodynamic perturbation theory, the Flory-Stockmayer theory of polymerization, and scaled particle theory for fluids in porous media. We analyze thermodynamic properties, percolation thresholds, and phase behavior, and compare the selected results with new computer simulations. The interplay between antibody-antibody and antibody-matrix interactions produces a complex phase behavior, including re-entrant phase separation with a closed-loop coexistence region at higher temperatures and conventional liquid-gas separation at lower temperatures.
Antibodies play an essential role in the immune response to viral infections, vaccination, or antibody therapy. Nevertheless, they can be either protective or harmful during the immune response. Moreover, competition or cooperation between mixed antibodies can enhance or reduce this protective or harmful effect. Using the laws of chemical reactions, we propose a new approach to modeling the antigen-antibody complex activity. The resulting expression covers not only purely competitive or purely independent binding but also synergistic binding which, depending on the antibodies, can promote either neutralization or enhancement of viral activity. We then integrate this expression of viral activity in a within-host model and investigate the existence of steady-states and their asymptotic stability. We complete our study with numerical simulations to illustrate different scenarios: firstly, where both antibodies are neutralizing, and secondly, where one antibody is neutralizing and the other enhancing. The results indicate that efficient viral neutralization is associated with purely independent antibody binding, whereas strong viral activity enhancement is expected in the case of purel
Antibody-facilitated immune responses are central to the body's defense against pathogens, viruses, and other foreign invaders. The ability of antibodies to specifically bind and neutralize antigens is vital for maintaining immunity. Over the past few decades, bioengineering advancements have significantly accelerated therapeutic antibody development. These antibody-derived drugs have shown remarkable efficacy, particularly in treating cancer, SARS-CoV-2, autoimmune disorders, and infectious diseases. Traditionally, experimental methods for affinity measurement have been time-consuming and expensive. With the advent of artificial intelligence, in silico medicine has been revolutionized; recent developments in machine learning, particularly the use of large language models (LLMs) for representing antibodies, have opened up new avenues for AI-based design and improved affinity prediction. Herein, we present an advanced antibody-antigen binding affinity prediction model (LlamaAffinity), leveraging an open-source Llama 3 backbone and antibody sequence data sourced from the Observed Antibody Space (OAS) database. The proposed approach shows significant improvement over existing state-of
Antibody engineering is essential for developing therapeutics and advancing biomedical research. Traditional discovery methods often rely on time-consuming and resource-intensive experimental screening. To enhance and streamline this process, we introduce a production-grade, high-throughput platform built on HelixFold3, HelixDesign-Antibody, which utilizes the high-accuracy structure prediction model, HelixFold3. The platform facilitates the large-scale generation of antibody candidate sequences and evaluates their interaction with antigens. Integrated high-performance computing (HPC) support enables high-throughput screening, addressing challenges such as fragmented toolchains and high computational demands. Validation on multiple antigens showcases the platform's ability to generate diverse and high-quality antibodies, confirming a scaling law where exploring larger sequence spaces increases the likelihood of identifying optimal binders. This platform provides a seamless, accessible solution for large-scale antibody design and is available via the antibody design page of PaddleHelix platform.
The toxins associated with infectious diseases are potential targets for inhibitors which have the potential for prophylactic or therapeutic use. Many antibodies have been generated for this purpose, and the objective of this study was to develop a simple mathematical model that may be used to evaluate the potential protective effect of antibodies. This model was used to evaluate the contributions of antibody affinity and concentration to reducing antibody-receptor complex formation and internalization. The model also enables prediction of the antibody kinetic constants and concentration required to provide a specified degree of protection. We hope that this model, once validated experimentally, will be a useful tool for in vitro selection of potentially protective antibodies for progression to in vivo evaluation.
The fast and untraceable virus mutations take lives of thousands of people before the immune system can produce the inhibitory antibody. Recent outbreak of novel coronavirus infected and killed thousands of people in the world. Rapid methods in finding peptides or antibody sequences that can inhibit the viral epitopes of COVID-19 will save the life of thousands. In this paper, we devised a machine learning (ML) model to predict the possible inhibitory synthetic antibodies for Corona virus. We collected 1933 virus-antibody sequences and their clinical patient neutralization response and trained an ML model to predict the antibody response. Using graph featurization with variety of ML methods, we screened thousands of hypothetical antibody sequences and found 8 stable antibodies that potentially inhibit COVID-19. We combined bioinformatics, structural biology, and Molecular Dynamics (MD) simulations to verify the stability of the candidate antibodies that can inhibit the Corona virus.