Cyclic peptides offer inherent advantages in pharmaceuticals. For example, cyclic peptides are more resistant to enzymatic hydrolysis compared to linear peptides and usually exhibit excellent stability and affinity. Although deep generative models have achieved great success in linear peptide design, several challenges prevent the development of computational methods for designing diverse types of cyclic peptides. These challenges include the scarcity of 3D structural data on target proteins and associated cyclic peptide ligands, the geometric constraints that cyclization imposes, and the involvement of non-canonical amino acids in cyclization. To address the above challenges, we introduce CpSDE, which consists of two key components: AtomSDE, a generative structure prediction model based on harmonic SDE, and ResRouter, a residue type predictor. Utilizing a routed sampling algorithm that alternates between these two models to iteratively update sequences and structures, CpSDE facilitates the generation of cyclic peptides. By employing explicit all-atom and bond modeling, CpSDE overcomes existing data limitations and is proficient in designing a wide variety of cyclic peptides. Our e
Cyclic peptides represent a promising class of therapeutic compounds in modern drug discovery, often offering improved stability and binding affinity. However, the de novo design of cyclic peptides remains challenging because methods must identify pocket-adaptive cyclization patterns and linkage sites while simultaneously controlling drug-relevant properties. This challenge is particularly pronounced for recent generative models trained predominantly on linear peptide data, which may fail to capture cyclization-specific constraints. To address the limitation, we introduce APCyc, a target-aware de novo cyclic peptide generation framework that explicitly models cyclization and jointly optimizes multiple essential physicochemical properties. By using an expanded residue vocabulary and explicitly encoding cyclization-site and linkage-type information, APCyc learns cyclization-aware representations and leverages Bayesian posterior guidance to steer sampling toward cyclic peptides satisfying multiple property objectives. Experimental results demonstrate that our model learns target-dependent cyclization preferences, and enables effective and controllable multi-property optimization for c
Addressing the growing need for organized data on tumor homing peptides (THPs), we present TumorHoPe2, a manually curated database offering extensive details on experimentally validated THPs. This represents a significant update to TumorHoPe, originally developed by our group in 2012. TumorHoPe2 now contains 1847 entries, representing 1297 unique tumor homing peptides, a substantial expansion from the 744 entries in its predecessor. For each peptide, the database provides critical information, including sequence, terminal or chemical modifications, corresponding cancer cell lines, and specific tumor types targeted. The database compiles data from two primary sources: phage display libraries, which are commonly used to identify peptide ligands targeting tumor-specific markers, and synthetic peptides, which are chemically modified to enhance properties such as stability, binding affinity, and specificity. Our dataset includes 594 chemically modified peptides, with 255 having N-terminal and 195 C-terminal modifications. These THPs have been validated against 172 cancer cell lines and demonstrate specificity for 37 distinct tumor types. To maximize utility for the research community, T
This study employed neutral PBS buffer combined with ammonium sulfate fractionation to isolate peptide-active fractions (PD-30, PD-50, PD-80) from Cuminum cyminum L.seeds (cumin) and systematically evaluated their antimicrobial, antioxidant, hypoglycemic, and anticancer activities. The results demonstrated that the PD-80 fraction exhibited potent antifungal activity against Candida albicans (inhibition zone diameter: 12.5 mm) and significant antioxidant capacity, with DPPH and ABTS radical scavenging rates of 72.4% and 78.9%. The PD-50 fraction showed the strongest antibacterial effect against Escherichia coli (inhibition zone diameter: 11.7mm), while PD-30 displayed the highest inhibitory activity against PTP1B (IC50=18.39 ug/mL), indicating its potential for hypoglycemic applications. Through mass spectrometry and database alignment, 414 peptides were identified for the first time in cumin-derived PBS extracts, including 18 structurally novel monomers comprising 11 antimicrobial peptides, 7 anticancer peptides, and 6 hypoglycemic peptides. Notably, peptide CK12 shares sequence homology (59% similarity) with the HIV fusion inhibitor T20, suggesting potential antiviral activity. Th
Cuminum cyminum L. (cumin) is a medicinal and edible plant widely used in traditional Chinese medicine (TCM) for treating various ailments, including diarrhea, abdominal pain, inflammation, asthma, and diabetes. While previous research has primarily focused on its essential oils, studies on its protein-derived bioactive peptides remain limited. In this study, we employed an innovative extraction method to isolate peptides from cumin seeds for the first time and screened their biological activities, revealing significant antimicrobial, antioxidant, and hypoglycemic properties. Guided by bioactivity, we utilized advanced separation and structural identification techniques, including Matrix-Assisted Laser Desorption/Ionization Time-of-Flight Mass Spectrometry (MALDI-TOF/TOF MS/MS), to systematically purify and characterize cumin-derived peptides. A total of 479 unique peptide sequences were identified using Mascot software and the SwissProt/UniProt_Bos databases. Among these, 15 highly bioactive peptides were selected for further analysis based on bioactivity and toxicity predictions using PeptideRanker and ToxinPred. Structural characterization revealed key features, such as α-helice
Amphipathic peptides are considered promising antibiotics because of their ability to form pores in bacterial membranes. In two companion papers, we analyzed both experimentally and theoretically the mechanisms and consequences of the interaction of two types of amphipathic peptides (magainin and melittin) with lipid membranes. We studied this interaction for different peptide concentration: low, high, and low concentration followed by the addition of peptides in high concentration. Here we provide the theoretical description of the pore formation mechanisms. We predicted theoretically that two peptide molecules are enough to locally induce the formation of a small metastable pore that continuously connects two membrane leaflets and allows peptide and lipid translocation between the leaflets. This mechanism (referred to as local) is supposed to work at low peptide concentrations. When applied in high concentration, the one-sided adsorption of peptides onto a closed membrane generates lateral pressure in the contacting lipid monolayer and lateral tension in the opposing monolayer. Our calculations predicted such asymmetric pressure/tension to greatly facilitate the formation of larg
Antimicrobial peptides (AMPs) have intrigued researchers for decades due to the contradiction between their high potential against resistant bacteria and the inability to find a structure-function relationship for the development of an effective and non-toxic agent. In the present study and the companion paper [Phys. Rev. E (2024)], we performed a comprehensive experimental and theoretical analysis of various aspects of AMP-membrane interactions and AMP-induced pore formation. Using the well-known melittin and magainin as examples, we showed, using patch-clamp and fluorescence measurements, that these peptides, even at nanomolar concentrations, modify the membrane by making it permeable to protons (and, possibly, water), but not to ions, and protect the membrane from large pore formation after subsequent addition of 20-fold higher concentrations of AMPs. This protective effect is independent of the membrane side (or both sides) of the peptide addition and is determined by the peptide-induced deformations of the membrane. Peptides create small, H+-permeable pores that incessantly connect the opposing membrane leaflets, allowing translocation of peptides and lipids and thus preventin
Proteins are vital biological molecules found in every living organism, and their function is determined by what shape they fold into. Peptides are essentially subsets of proteins, and therefore ideal as model systems for protein folding. The structure of a molecule is closely related to its vibrational absorption spectrum, which lies in the infrared (IR) range. However, in vivo IR spectroscopy is hindered by interference from the surrounding water. Therefore, peptides are preferably studied isolated from solution, in the gas phase. This chapter summarizes the recent IR spectroscopy studies of gas-phase peptides. The collected works show that IR spectroscopy combined with quantum chemical calculations is a powerful tool for deducing the molecular structure. Moreover the wealth of experimental spectra makes possible the evaluation of different quantum chemical models, which can be applied to the larger proteins.
The primary structures of peptides, originating from food proteins, affect their taste. Connecting primary structure to taste, however, is difficult because the size of the peptide sequence space increases exponentially with increasing peptide length, while experimentally-labeled data on peptides' tastes remain scarce. We propose a method that coarse-grains the sequence space to reduce its size and systematically identifies the most common coarse-grained residue patterns found in known bitter and umami peptides. We select the optimal patterns by performing extensive out-of-sample tests. The optimal patterns better represent the bitter and umami peptides when compared against baseline peptides, bitter peptides with all hydrophobic residues and umami peptides with all negatively charged residues, and peptides with randomly-chosen residues. Our method complements quantitative structure--activity relationship methods by offering generic, coarse-grained bitter and umami residue patterns that can aid in locating short bitter or umami segments in a protein and in designing new umami peptides.
Nuclear quantum effects (NQEs) arising from the light mass of hydrogen can influence the structure and stability of hydrogen-bonded biomolecules, yet their role in determining peptide and protein folding remains unclear. Experiments show that substituting H$_2$O with D$_2$O often stabilizes folded states, but the microscopic mechanism associated with this phenomena remains unresolved. Through ab initio-level path-integral molecular dynamics simulations enabled by machine-learning interatomic potentials, we address the fundamental question of the role of NQEs in peptides by investigating both their overall impact and isotope substitution effects. Overall, NQEs systematically destabilize compact three-dimensional structures across peptide systems, independent of secondary structure type or side-chain interactions. Contrary to the conventional picture that places central importance on hydrogen bonds, we find that the dominant destabilization instead arises from the quantum C-H vibrations. In addition, we reveal microscopic insights into the stabilization of folded peptides upon H$_2$O to D$_2$O substitution, showing that the H/D isotope substitution of active peptide hydrogens, previo
This independent research investigates methods to improve the precision of cyclic peptide generation targeting the HIV gp120 trimer using AlphaFold. The study explores proximity-based hotspot mapping at the CD4 binding site, centroid distance penalization, generative loss tuning, and custom loss function development. These enhancements produced cyclic peptides that closely resemble the binding conformation of the CD4 attachment inhibitor BMS-818251. The proposed methodology demonstrates improved structural control and precision in cyclic peptide generation, advancing the applicability of AlphaFold in structure-based drug discovery.
There are numerous peptides discovered through past decades, which exhibit antimicrobial and anti-cancerous tendencies. Due to these reasons, peptides are supposed to be sound therapeutic candidates. Some peptides can pose low metabolic stability, high toxicity and high hemolity of peptides. This highlights the importance for evaluating hemolytic tendencies and toxicity of peptides, before using them for therapeutics. Traditional methods for evaluation of toxicity of peptides can be time-consuming and costly. In this study, we have extracted peptides data (Hemo-DB) from Database of Antimicrobial Activity and Structure of Peptides (DBAASP) based on certain hemolity criteria and we present a machine learning based method for prediction of hemolytic tendencies of peptides (i.e. Hemolytic or Non-Hemolytic). Our model offers significant improvement on hemolity prediction benchmarks. we also propose a reliable clustering-based train-tests splitting method which ensures that no peptide in train set is more than 40% similar to any peptide in test set. Using this train-test split, we can get reliable estimated of expected model performance on unseen data distribution or newly discovered pep
Molecular docking is a structure-based computational drug design technique for predicting the interaction between a small molecule (ligand) and a macromolecule (receptor). Over the past three decades various docking software programs have been developed, mostly for drug-like molecules. With the recent interest in peptides as therapeutic molecules, several peptide docking methods have also been developed. AutoDock CrankPep (ADCP), is a state-of-the-art peptide docking tool that predicts the interaction of peptide with up to 20 amino acids in a user defined region of a macromolecule, i.e.focused docking. Recent advances in deep learning (DL) approaches have shown remarkable success in docking linear peptides composed of natural amino acids only. Unlike ADCP, these methods provide a confidence level in their predictions. Here we explore whether ADCP and various DL methods (AlphaFold2 Monomer, AlphaFold2 Multimer, and OmegaFold) and their prediction confidence metric can be used to discriminate native and non-native substrates for HIV and FIV proteases. We found that ADCP successfully predicts the interactions of native peptides but fails to discriminate non-native ones. This was expec
The formation of protein precursors, due to the condensation of atomic carbon under the low-temperature conditions of the molecular phases of the interstellar medium, opens alternative pathways for the origin of life. We perform peptide synthesis under conditions prevailing in space and provide a comprehensive analytic characterization of its products. The application of 13C allowed us to confirm the suggested pathway of peptide formation that proceeds due to the polymerization of aminoketene molecules that are formed in the C + CO + NH3 reaction. Here, we address the question of how the efficiency of peptide production is modified by the presence of water molecules. We demonstrate that although water slightly reduces the efficiency of polymerization of aminoketene, it does not prevent the formation of peptides.
Computational and machine learning approaches to model the conformational landscape of macrocyclic peptides have the potential to enable rational design and optimization. However, accurate, fast, and scalable methods for modeling macrocycle geometries remain elusive. Recent deep learning approaches have significantly accelerated protein structure prediction and the generation of small-molecule conformational ensembles, yet similar progress has not been made for macrocyclic peptides due to their unique properties. Here, we introduce CREMP, a resource generated for the rapid development and evaluation of machine learning models for macrocyclic peptides. CREMP contains 36,198 unique macrocyclic peptides and their high-quality structural ensembles generated using the Conformer-Rotamer Ensemble Sampling Tool (CREST). Altogether, this new dataset contains nearly 31.3 million unique macrocycle geometries, each annotated with energies derived from semi-empirical extended tight-binding (xTB) DFT calculations. Additionally, we include 3,258 macrocycles with reported passive permeability data to couple conformational ensembles to experiment. We anticipate that this dataset will enable the dev
The peptides in biosystems are homochiral polymers of L-amino acids, but razemisate slowly by an active isomerization kinetics. The chemical reactions in biosystems are, however, reversible and what racemisates the peptides at the water activity in the biosystems can ensure homochirality at a smaller activity. Here we show by a thermodynamics analysis and by comprehensive Molecular Dynamics simulations of models of peptides, that the isomerization kinetics racemisates the peptides at a high water activity in agreement with experimental observations of aging of peptides , but enhances homochirality at a smaller water activity. The hydrophobic core of the peptide in an enzyme can ensure homochirality at a low water activity, and thus the establishment of homochirality at the origin of life and ageing of peptides and dead of biosystems might be strongly connected.
Cyclic peptides, characterized by geometric constraints absent in linear peptides, offer enhanced biochemical properties, presenting new opportunities to address unmet medical needs. However, designing target-specific cyclic peptides remains underexplored due to limited training data. To bridge the gap, we propose CP-Composer, a novel generative framework that enables zero-shot cyclic peptide generation via composable geometric constraints. Our approach decomposes complex cyclization patterns into unit constraints, which are incorporated into a diffusion model through geometric conditioning on nodes and edges. During training, the model learns from unit constraints and their random combinations in linear peptides, while at inference, novel constraint combinations required for cyclization are imposed as input. Experiments show that our model, despite trained with linear peptides, is capable of generating diverse target-binding cyclic peptides, reaching success rates from 38% to 84% on different cyclization strategies.
Taste peptides have emerged as promising natural flavoring agents attributed to their unique organoleptic properties, high safety profile, and potential health benefits. However, the de novo identification of taste peptides derived from animal, plant, or microbial sources remains a time-consuming and resource-intensive process, significantly impeding their widespread application in the food industry. Here, we present TastePepAI, a comprehensive artificial intelligence framework for customized taste peptide design and safety assessment. As the key element of this framework, a loss-supervised adaptive variational autoencoder (LA-VAE) is implemented to efficiently optimizes the latent representation of sequences during training and facilitates the generation of target peptides with desired taste profiles. Notably, our model incorporates a novel taste-avoidance mechanism, allowing for selective flavor exclusion. Subsequently, our in-house developed toxicity prediction algorithm (SpepToxPred) is integrated in the framework to undergo rigorous safety evaluation of generated peptides. Using this integrated platform, we successfully identified 73 peptides exhibiting sweet, salty, and umami
Peptide therapeutics, including macrocycles, peptide inhibitors, and bioactive linear peptides, play a crucial role in therapeutic development due to their unique physicochemical properties. However, predicting these properties remains challenging. While structure-based models primarily focus on local interactions, language models are capable of capturing global therapeutic properties of both modified and linear peptides. Protein language models like ESM-2, though effective for natural peptides, cannot however encode chemical modifications. Conversely, pre-trained chemical language models excel in representing small molecule properties but are not optimized for peptides. To bridge this gap, we introduce PepDoRA, a unified peptide representation model. Leveraging Weight-Decomposed Low-Rank Adaptation (DoRA), PepDoRA efficiently fine-tunes the ChemBERTa-77M-MLM on a masked language model objective to generate optimized embeddings for downstream property prediction tasks involving both modified and unmodified peptides. By tuning on a diverse and experimentally valid set of 100,000 modified, bioactive, and binding peptides, we show that PepDoRA embeddings capture functional properties
Antimicrobial peptides have emerged as promising molecules to combat antimicrobial resistance. However, fragmented datasets, inconsistent annotations, and the lack of standardized benchmarks hinder computational approaches and slow down the discovery of new candidates. To address these challenges, we present the Expanded Standardized Collection for Antimicrobial Peptide Evaluation (ESCAPE), an experimental framework integrating over 80.000 peptides from 27 validated repositories. Our dataset separates antimicrobial peptides from negative sequences and incorporates their functional annotations into a biologically coherent multilabel hierarchy, capturing activities across antibacterial, antifungal, antiviral, and antiparasitic classes. Building on ESCAPE, we propose a transformer-based model that leverages sequence and structural information to predict multiple functional activities of peptides. Our method achieves up to a 2.56% relative average improvement in mean Average Precision over the second-best method adapted for this task, establishing a new state-of-the-art multilabel peptide classification. ESCAPE provides a comprehensive and reproducible evaluation framework to advance A