共找到 20 条结果
The diverse metabolic pathways are fundamental to all living organisms, as they harvest energy, synthesize biomass components, produce molecules to interact with the microenvironment, and neutralize toxins. While discovery of new metabolites and pathways continues, the prediction of pathways for new metabolites can be challenging. It can take vast amounts of time to elucidate pathways for new metabolites; thus, according to HMDB only 60% of metabolites get assigned to pathways. Here, we present an approach to identify pathways based on metabolite structure. We extracted 201 features from SMILES annotations, and identified new metabolites from PubMed abstracts and HMDB. After applying clustering algorithms to both groups of features, we quantified correlations between metabolites, and found the clusters accurately linked 92% of known metabolites to their respective pathways. Thus, this approach could be valuable for predicting metabolic pathways for new metabolites.
Aqueous metabolites in terrestrial subsurface environments provide critical analog frameworks for assessing the habitability of Martian subsurface ice. On Earth, they play critical roles in sustaining microbial life within soils, permafrost, and groundwater environments and their availability shape microbial community compositions, activity, and adaptability to changes in environmental conditions, enabling communities to persist over millennial timescales. The counterpart to aqueous-soluble organics is the insoluble organic matter pool that makes up the largest portion of organic matter in natural samples and includes most types of organic signatures indicative of biological processes. Employing a range of sample preparation, molecular separation, detection, and imaging techniques enables the characterization of both labile (i.e., soluble and reactive) and recalcitrant (i.e., insoluble, non-reactive; include macromolecules) organic pools. Multiple orthogonal analytical modalities strengthen interpretations of signatures that we associate with biology as we know it and don't know it, by constraining possible abiotic sources, validating measurements across distinct techniques, and en
Water diffusion MRI is a very powerful tool for probing tissue microstructure, but disentangling the contribution of compartment-specific structural disorder from cellular restriction and inter-compartment exchange remains an open challenge. Here, we use diffusion MR spectroscopy (dMRS) of water and metabolites as a function of diffusion time in vivo in mouse Gray Matter (GM) to shed light on: which of these concomitant mechanisms dominates the MR measurements and with which specific signature. We report the diffusion time-dependence of water with excellent SNR conditions up to 500 ms. Water kurtosis decreases with increasing diffusion time, showing the concomitant influence of both structural disorder and exchange. Despite the excellent SNR, we were not able to identify clearly the nature of the structural disorder (i.e. 1D versus 2D/3D short-range disorder). Measurements of intracellular metabolites diffusion time-dependence (up to 500 ms) show opposite behavior to water, with metabolites kurtosis increasing as a function of diffusion time. We show that this is a signature of diffusion restricted in the intracellular space from which cellular microstructural features can be estim
The rapidly expanding field of metabolomics presents an invaluable resource for understanding the associations between metabolites and various diseases. However, the high dimensionality, presence of missing values, and measurement errors associated with metabolomics data can present challenges in developing reliable and reproducible methodologies for disease association studies. Therefore, there is a compelling need to develop robust statistical methods that can navigate these complexities to achieve reliable and reproducible disease association studies. In this paper, we focus on developing such a methodology with an emphasis on controlling the False Discovery Rate during the screening of mutual metabolomic signals for multiple disease outcomes. We illustrate the versatility and performance of this procedure in a variety of scenarios, dealing with missing data and measurement errors. As a specific application of this novel methodology, we target two of the most prevalent cancers among US women: breast cancer and colorectal cancer. By applying our method to the Wome's Health Initiative data, we successfully identify metabolites that are associated with either or both of these cance
A new sequence for single-voxel diffusion-weighted 1H MRS (DWS), named DW-SPECIAL, is proposed to improve the detection and subsequent estimation of the diffusion properties of strongly J-coupled metabolites. It combines the semi-adiabatic SPECIAL sequence with a stimulated echo (STE) diffusion block. Acquisitions with DW-SPECIAL and STE-LASER, the current gold-standard for rodent DWS experiments at high fields, were performed at 14.1T on phantoms and in vivo on the rat brain. The apparent diffusion coefficient and intra-stick diffusivity (Callaghan's model) were fitted and compared between the sequences for glutamate, glutamine (Gln), myo-inositol, taurine, total N-acetylaspartate, total choline, total creatine and the macromolecules. The shorter echo time achieved with DW-SPECIAL (18 ms against 33 ms with STE-LASER) substantially limited the metabolites' signal loss caused by J-evolution. In addition, DW-SPECIAL preserved the main advantages of STE-LASER: absence of cross-terms, diffusion time during a STE and limited sensitivity to B1 inhomogeneities. In vivo, compared to STE-LASER, DW-SPECIAL yielded the same spectral quality and reduced the Cramer Rao Lower Bounds (CRLB) for J
The identification of metabolites from complex biological samples often involves matching experimental mass spectrometry data to signatures of compounds derived from massive chemical databases. However, misidentifications may result due to the complexity of potential chemical space that leads to databases containing compounds with nearly identical structures. Prior knowledge of compounds that may be enzymatically consumed or produced by an organism can help reduce misidentifications by restricting initial database searching to compounds that are likely to be present in a biological system. While databases such as UniProt allow for the identification of small molecules that may be consumed or generated by enzymes encoded in an organism's genome, currently no tool exists for identifying SMILES strings of metabolites associated with protein identifiers and expanding R-containing substructures to fully defined, biologically relevant chemical structures. Here we present Proteome2Metabolome (P2M), a tool that performs these tasks using external database querying behind a simple command line interface. Beyond mass spectrometry based applications, P2M can be generally used to identify biol
Nuclear Magnetic Resonance (NMR) spectra are widely used in metabolomics to obtain profiles of metabolites dissolved in biofluids such as cell supernatants. Methods for estimating metabolite concentrations from these spectra are presently confined to manual peak fitting and to binning procedures for integrating resonance peaks. Extensive information on the patterns of spectral resonance generated by human metabolites is now available in online databases. By incorporating this information into a Bayesian model we can deconvolve resonance peaks from a spectrum and obtain explicit concentration estimates for the corresponding metabolites. Spectral resonances that cannot be deconvolved in this way may also be of scientific interest so we model them jointly using wavelets. We describe a Markov chain Monte Carlo algorithm which allows us to sample from the joint posterior distribution of the model parameters, using specifically designed block updates to improve mixing. The strong prior on resonance patterns allows the algorithm to identify peaks corresponding to particular metabolites automatically, eliminating the need for manual peak assignment. We assess our method for peak alignment
Untargeted metabolomic studies are revealing large numbers of naturally occurring metabolites that cannot be characterized because their chemical structures and MS/MS spectra are not available in databases. Here we present iMet, a computational tool based on experimental tandem mass spectrometry that could potentially allow the annotation of metabolites not discovered previously. iMet uses MS/MS spectra to identify metabolites structurally similar to an unknown metabolite, and gives a net atomic addition or removal that converts the known metabolite into the unknown one. We validate the algorithm with 148 metabolites, and show that for 89% of them at least one of the top four matches identified by iMet enables the proper annotation of the unknown metabolite. iMet is freely available at http://imet.seeslab.net.
Microbial communities display extreme diversity. A variety of strains or species coexist even when limited by a single resource. It has been argued that metabolite secretion creates new niches and facilitates such diversity. Nonetheless, it is still a controversial topic why cells secrete even essential metabolites so often; in fact, even under isolation conditions, microbial cells secrete various metabolites, including those essential for their growth. First, we demonstrate that leaking essential metabolites can be advantageous. If the intracellular chemical reactions include multibody reactions like catalytic reactions, this advantageous leakage of essential metabolites is possible and indeed typical for most metabolic networks via "flux control" and "growth-dilution" mechanisms; the later is a result of the balance between synthesis and growth-induced dilution with autocatalytic reactions. Counterintuitively, the mechanisms can work even when the supplied resource is scarce. Next, when such cells are crowded, the presence of another cell type, which consumes the leaked chemicals is beneficial for both cell types, so that their coexistence enhances the growth of both. The latter
Motivation: NMR spectra are widely used in metabolomics to obtain metabolite profiles in complex biological mixtures. Common methods used to assign and estimate concentrations of metabolites involve either an expert manual peak fitting or extra pre-processing steps, such as peak alignment and binning. Peak fitting is very time consuming and is subject to human error. Conversely, alignment and binning can introduce artefacts and limit immediate biological interpretation of models. Results: We present the Bayesian AuTomated Metabolite Analyser for NMR spectra (BATMAN), an R package which deconvolutes peaks from 1-dimensional NMR spectra, automatically assigns them to specific metabolites from a target list and obtains concentration estimates. The Bayesian model incorporates information on charac-teristic peak patterns of metabolites and is able to account for shifts in the position of peaks commonly seen in NMR spectra of biological samples. It applies a Markov Chain Monte Carlo (MCMC) algorithm to sample from a joint posterior distribution of the model parameters and obtains concentration estimates with reduced error compared with conventional numerical integration and comparable to
The large-scale shape and function of metabolic networks are intriguing topics of systems biology. Such networks are on one hand commonly regarded as modular (i.e. built by a number of relatively independent subsystems), but on the other hand they are robust in a way not expected of a purely modular system. To address this question we carefully discuss the partition of metabolic networks into subnetworks. The practice of preprocessing such networks by removing the most abundant substrates, "currency metabolites," is formalized into a network-based algorithm. We study partitions for metabolic networks of many organisms and find cores of currency metabolites and modular peripheries of what we call "commodity metabolites." The networks are found to be more modular than random networks but far from perfectly divisible into modules. We argue that cross-modular edges are the key for the robustness of metabolism.
The pharmaceutical success of atorvastatin (ATV), a widely employed drug against the "bad" cholesterol (LDL) and cardiovascular diseases, traces back to its ability to scavenge free radicals. Unfortunately, information on its antioxidant properties is missing or unreliable. Here, we report detailed quantum chemical results for ATV and its ortho- and para-hydroxy metabolites (o-ATV, p-ATV) in the methanolic phase. They comprise global reactivity indices, bond order indices, and spin densities as well as all relevant enthalpies of reaction (bond dissociation BDE, ionization IP and electron attachment EA, proton detachment PDE and proton affinity PA, and electron transfer ETE). With these properties in hand, we can provide the first theoretical explanation of the experimental finding that, due to their free radical scavenging activity, ATV hydroxy metabolites rather than the parent ATV, have substantial inhibitory effect on LDL and the like. Surprisingly (because it is contrary to the most cases currently known), we unambiguously found that HAT (direct hydrogen atom transfer) rather than SPLET (sequential proton loss electron transfer) or SET-PT (stepwise electron transfer proton tran
Drug metabolites usually have structures of split-ring resonators (SRRs), which might lead to negative permittivity and permeability in electromagnetic field. As a result, in the UV-vis region, the latent fingermarks images of drug addicts and non drug users are inverse. The optical properties of latent fingermarks are quite different between drug addicts and non-drug users. This is a technic superiority for crime scene investigation to distinguish them. In this paper, we calculate the permittivity and permeability of drug metabolites using tight-binding model. The latent fingermarks of smokers and non-smokers are given as an example.
To understand the system-wide organization of metabolism, different lines of study have devised different categorizations of metabolites. The relationship and difference between categories can provide new insights for a more detailed description of the organization of metabolism. In this study, we investigate the relative organization of three categorizations of metabolites -- pathways, subcellular localizations and network clusters, by block-model techniques borrowed from social-network studies and further characterize the categories from topological point of view. The picture of the metabolism we obtain is that of peripheral modules, characterized both by being dense network clusters and localized to organelles, connected by a central, highly connected core. Pathways typically run through several network clusters and localizations, connecting them laterally. The strong overlap between organelles and network clusters suggest that these are natural "modules" -- relatively independent sub-systems. The different categorizations divide the core metabolism differently suggesting that this, if possible, should be not be treated as a module on par with the organelles. Although overlappin
Database search is a widely used approach for identifying metabolites from tandem mass spectra (MS/MS). In this strategy, an experimental spectrum is matched against a user-specified database of candidate metabolites, and candidates are ranked such that true metabolite-spectrum matches receive the highest scores. Machine-learning methods have been widely incorporated into database-search-based identification tools and have substantially improved performance. To further improve identification accuracy, we propose a new framework for generating negative training samples. The framework first uses autoencoders to learn latent representations of metabolite structures and MS/MS spectra, thereby recasting metabolite-spectrum matching as matching between latent vectors. It then uses a GAN to generate latent vectors of decoy metabolites and constructs decoy metabolite-spectrum matches as negative samples for training. Experimental results show that our tool, MS2MetGAN, achieves better overall performance than existing metabolite identification methods.
Genome-scale metabolic models (GEMs) are essential tools for systems biology and rational chassis design, but conventional top-down reconstruction depends heavily on sequence homology and often leaves unknown enzymes and metabolic dark matter unresolved. Direct reconstruction from metabolomics is also difficult because mapping observed metabolites to reactions is an ill-posed inverse problem with combinatorial ambiguity and possible spurious networks. Here we present MetaGEM, a bottom-up framework that uses enzymes as physical anchors to convert system-level network inference into enzyme-metabolite interaction prediction. MetaGEM uses a multimodal dual-tower architecture that combines protein evolutionary semantics from a protein language model with three-dimensional metabolite representations. It further introduces contrastive learning with hard negative mining to separate structurally similar metabolites and reduce false positive interactions. On a de-homologized benchmark, MetaGEM achieves state-of-the-art enzyme-metabolite prediction performance, with AUROC of 0.9701 and MCC of 0.8033, and remains robust under low sequence identity splits. In downstream reconstruction, MetaGEM
Numerous studies have shown that microbial metabolites, which represent the products of bacteria in the human gut, play a key role in shaping cancer risk and response to treatment. However, metabolite data typically contain a large proportion of missing values, which may result from either low abundance or technical challenges in data processing. Moreover, given the compositionality of microbiome data, where the observed abundances can only be interpreted on a relative scale, standard variable selection methods are not applicable. In this project, we propose a novel Bayesian regression method to address these challenges in the integration of metabolite and microbiome data. Key features of our proposed model include modeling the two different mechanisms of missingness for the metabolite data and adopting a Bayesian prior designed to address the compositional characteristics of microbiome data. We demonstrate on simulated data that our proposed model can accurately impute the unobserved true metabolite values and correctly select the relevant microbiome predictors. We further illustrate our method using real data from a study focused on understanding the interplay between the microbi
Accurately identifying metabolites i.e. small molecules from mass spectrometry data remains a core challenge in metabolomics, with broad applications in drug discovery, environmental analysis, and clinical research. We address the Molecule Retrieval task, which consists in recovering the chemical structure of a metabolite from its MS/MS spectrum given a set of candidate molecules. While the recent release of benchmark datasets such as MassSpecGym and Spectraverse has considerably accelerated the development of novel machine learning approaches, the complexity of data preprocessing pipelines and the lack of unified implementations make methods and results difficult to reproduce and compare. We make three contributions. First, we propose a unified framework encompassing recent approaches based on representation alignment and contrastive learning. Second, we introduce MSAlign, inspired by multimodal alignment in vision-language models, which learns a shared representation space by aligning two frozen foundation models (DreaMS for mass spectra and ChemBERTa for molecules) through lightweight MLP projections trained with a candidate-based contrastive objective. MSAlign is simple to impl
Correlation networks are commonly used to infer associations between microbes and metabolites. The resulting p-values are then corrected for multiple comparisons using existing methods such as the Benjamini and Hochberg procedure to control the false discovery rate (FDR). However, most existing methods for FDR control assume the p-values are weakly dependent. Consequently, they can have low power in recovering microbe-metabolite association networks that exhibit important topological features, such as the presence of densely associated modules. We propose a novel inference procedure that is both powerful for detecting significant associations in the microbe-metabolite network and capable of controlling the FDR. Power enhancement is achieved by modeling latent structures in the form of a bipartite stochastic block model. We develop a variational expectation-maximization algorithm to estimate the model parameters and incorporate the learned graph in the testing procedure. In addition to FDR control, this procedure provides a clustering of microbes and metabolites into modules, which is useful for interpretation. We demonstrate the merit of the proposed method in simulations and an ap
Understanding the pathways through which diet affects human metabolism is a central task in nutritional epidemiology. This article proposes novel methodology to identify food items associated with blood metabolites in two cohorts of healthcare professionals. We analyze 30 food intake variables that exhibit relationship structure through their correlations and nutritional attributes. The metabolic responses include 244 compounds measured by mass spectrometry, presenting substantial challenges that include missingness, left-censoring, and skewness. While existing methods can address such factors in low-dimensional settings, they are not designed for high-dimensional regression involving strongly correlated predictors and non-normal outcomes. To address these challenges, we propose a novel Bayesian variable selection framework for metabolite response variables based on a skew-normal censored mixture model. To exploit substantive information on the nutritional similarities among dietary factors, we employ a Markov random field prior that encourages joint selection of related predictors, while introducing a new, efficient strategy for its hyperparameter specification. Applying this meth