INTRODUCTION: Bioinformatics tools, techniques and resources are critical to biomarker discovery, assessment, validation, qualification, standardization and market acceptance into clinical practice. Huge scientific effort and economic investment over the past 20 years have resulted in thousands of new candidate biomarkers for diseases, yet relatively few biomarkers have entered clinical practice. Bioinformatics is central to all stages of biomarker development and implementation. AREAS COVERED: This review examines bioinformatics advances that bear on each stage of biomarker development and suggests bioinformatics strategies to assist biomarkers towards clinical practice. This paper focuses on the steps of clinical biomarker development with an emphasis on the review literature from 2000 to June 2011. The intent of this paper is to describe the present role of bioinformatics in biomarker development including the controversies associated with various developmental stages. EXPERT OPINION: The key message is that more effective biomarker development requires database input of higher quality, improved bioinformatics tools to identify more clearly the acceptable criteria for each development step, as well as more and better database linkages.
Genomics is a discipline in genetics that applies recombinant DNA technology, DNA sequencing methods and bioinformatics to sequence, assemble and analyze the function and structure of genome, the complete set of DNA within a single cell of an organism. Bioinformatics is an inter-disciplinary scientific field that develops methods for storing, retrieving, organizing and analyzing biological data. The advances in bioinformatics have in turn made considerable impact on the development and improvements of genomics technologies such as shot-gun sequencing and high-throughput sequencing methods. The various genomics technologies are used for DNA and genome sequencing, assembly and annotations, which have several applications in medicine, agriculture, pharmaceuticals, biotechnology, research etc. These genomics technologies aided by bioinformatics have contributed to the successful completion of whole organism genome analysis, from prokaryotes to eukaryotes. In fact, the assembly of the human genome is one of the greatest achievements of bioinformatics.
暂无摘要(点击查看原文获取完整内容)
There is a need recognized by the National Institute of Dental & Craniofacial Research and the National Cancer Institute to advance basic, translational and clinical saliva research. The goal of the Salivaomics Knowledge Base (SKB) is to create a data management system and web resource constructed to support human salivaomics research. To maximize the utility of the SKB for retrieval,integration and analysis of data, we have developed the Saliva Ontology and SDxMart. This article reviews the informatics advances in saliva diagnostics made possible by the Saliva Ontology and SDxMart.
暂无摘要(点击查看原文获取完整内容)
For more than 30 years, the fluorescence-based technique of flow cytometry (FCM) has been widely used by clinicians, immunologists, and cancer biologists to distinguish different cell types in mixed cell subpopulations, based on the expression of cellular markers. In both health research and treatment, this analytical method is used for a variety of tasks, in particular the diagnosis and monitoring of cancer. This technology is also used for cross-matching organs for transplantation, and for research involving stem cells, vaccine development, apoptosis and phagocytosis. In the last decade, advances in FCM instrumentation and reagent technologies have enabled simultaneous single cell measurement of surface and intracellular markers, including cellular-activation markers, intracellular cytokines, immunological signaling, and cytoplasmic and nuclear cell cycle and transcription factors, thus positioning FCM to play an even bigger role in health care and medical research. However, the rapid expansion of FCM applications has outpaced the development of tools for storage, analysis, and data representation. For example, a typical FCM experiment may involve measurement of up to 20 different characteristics per cell, for hundreds of thousands of cells per sample. The increase in the amount of data generated by FCM techniques poses unique informatics and statistical challenges. It is widely recognized that one basic challenge for FCM is to simplify the extraction of data and statistical information. To date, very few bioinformatic and statistical tools exist to manage, analyze, present, and disseminate FCM data. Current FCM data analysis methods involve the use of multiple applications, the output of which is often fragmented. There is a widespread demand for the development of integrated data analysis tools to organize, analyze, and exchange FCM data. Such development is lagging far behind the ability to collect and process samples via FCM, much to the detriment of health research. This special issue aims to summarize the current state of bioinformatics research in FCM, to present the most recent developments in analytical tools and to open-up the field to new researchers to bring additional ideas and solutions to current bottlenecks. The issue includes several important contributions, which cover a wide range of approaches and techniques for FCM. These contributions are summarized as follows. Bashashati and Brinkman review state-of-the-art FCM data analysis approaches that can be used in a typical analysis pipeline going from quality assessment to sample classification. Not only does their paper review current techniques and approaches but it also points out potential pitfalls of these approaches and discusses strategies to overcome these. Much like with gene expression data, technical variation such as changes in the instrumentation channel voltages or changes in the specificity of the manufacturer of the antibodies can result in systematic biases. These biases need to be removed or at least minimized in order to allow proper data analysis and sample comparisons. Cichocki et al. present a novel normalization method to correct for time biases in large-scale flow cytometric analysis. They investigate two types of normalizing beads: broad spectrum and spectrum matched and propose two alternative normalization procedures that are usable in the absence of normalizing beads. Once data have been properly normalized, a component of FCM analysis involves identifying immunophenotypically distinct sub-populations of cells within each patient; this is referred to as “gating” in the FCM community. Although gating has traditionally been done visually, automated approaches based on statistical modeling of the data are starting to emerge. Walther et al. present such an approach based on a nonparametric statistical model that aims to form cell subpopulations that can be delineated by the contours of high-density regions much like in manual gating. Because their approach is non-parametric it can reproduce non-convex subpopulations that are known to occur in FCM samples, but which cannot be produced with current parametric model-based approaches. Much like Walther et al., Finak et al. present a framework for the identification of cell subpopulations in FCM data based on merging mixture components using the flowClust methodology. In this new approach, several parametric clusters can represent a single sub-population, and the approach can thus accommodate complicated FCM data distributions (e.g., non-convex sub-populations). Even though automated gating methods are becoming increasing popular, the majority of FCM experiments are still being analyzed visually, usually by serial inspection of one or two dimensions at a time. In order to improve and validate automated gating, it is important to compare automated gates to manual gates obtained by an expert. Gosink et al. introduce a Bioconductor package called flowFlowJo that can import gates defined by the commercial package FlowJo and work with them in a manner consistent with the other flow packages in Bioconductor. This work facilitates examination of gating robustness, allows one to combine manual and automated gating, and can be used to perform exploratory data analysis on manual gates. Another major goal in clinical applications is the identification of biological changes (e.g., proportion of cells within a subpopulation) that correlate with a disease in order to predict the status (e.g., healthy/diseased) of a patient. Rogers and Holyst present flowFP, a Bioconductor package for fingerprinting flow cytometric data. flowFP provides tools to transform raw FCM data into a form suitable for direct input into conventional statistical analysis and empirical modeling software tools (e.g., supervised classification). Among other things flowFP is based on a multivariate binning approach and thus can bypass the gating stage, which can be an advantage for complex flow data. In a similar clinical context, Eliot et al. investigate the use of tree-based methods for discovering associations between flow cytometry data and clinical endpoints. In particular, they compare a number of tree-based methods for their capability to select immunological predictors of CD4 reconstitution in HIV-infected subjects initiating anti-retroviral treatment. The authors show that tree-based methods can be successfully applied to flow cytometry data to better inform and discover associations that may not emerge in the context of a standard univariate analysis. Even though Bioconductor is a great platform for FCM allowing computational statisticians and bioinformaticians to leverage the power of R and other contributed packages, it can remain difficult to be used by biologists and clinicians. Lee et al. have developed an open source, extensible graphical user interface (GUI) iFlow, which sits on top of the Bioconductor backbone, enabling basic analyses by means of convenient graphical menus and wizards. iFlow is easily extensible in order to quickly integrate novel methodological developments. Finally, Strain et al. introduce plateCore, a new package that extends the functionality of core FCM Bioconductor packages to enable automated negative control-based gating and facilitate the processing and analysis of plate-based data sets from high-throughput FCM screening experiments.
Omics and bioinformatics are essential to understanding the molecular systems that underlie various plant functions. Recent game-changing sequencing technologies have revitalized sequencing approaches in genomics and have produced opportunities for various emerging analytical applications. Driven by technological advances, several new omics layers such as the interactome, epigenome and hormonome have emerged. Furthermore, in several plant species, the development of omics resources has progressed to address particular biological properties of individual species. Integration of knowledge from omics-based research is an emerging issue as researchers seek to identify significance, gain biological insights and promote translational research. From these perspectives, we provide this review of the emerging aspects of plant systems research based on omics and bioinformatics analyses together with their associated resources and technological advances.
Chemoinformatics involves integrating the principles of physical chemistry with computer-based and information science methodologies, commonly referred to as "in silico techniques", in order to address a wide range of descriptive and prescriptive chemistry issues, including applications to biology, drug discovery, and related molecular areas. On the other hand, the incorporation of machine learning has been considered of high importance in the field of drug design, enabling the extraction of chemical data from enormous compound databases to develop drugs endowed with significant biological features. The present review discusses the field of cheminformatics and proposes the use of virtual chemical libraries in virtual screening methods to increase the probability of discovering novel hit chemicals. The virtual libraries address the need to increase the quality of the compounds as well as discover promising ones. On the other hand, various applications of bioinformatics in disease classification, diagnosis, and identification of multidrug-resistant organisms were discussed. The use of ensemble models and brute-force feature selection methodology has resulted in high accuracy rates for heart disease and COVID-19 diagnosis, along with the role of special formulations for targeting meningitis and Alzheimer's disease. Additionally, the correlation between genomic variations and disease states such as obesity and chronic progressive external ophthalmoplegia, the investigation of the antibacterial activity of pyrazole and benzimidazole-based compounds against resistant microorganisms, and its applications in chemoinformatics for the prediction of drug properties and toxicity-all the previously mentioned-were presented in the current review.
Food industry aims to develop novel protein-based emulsifiers from sustainable sources (e.g. plants, seaweed/microalgae, microbial, and insects) to satisfy the clean-label demand by consumers. Enzymatic hydrolysis releases peptides with enhanced surface properties compared with the parent alternative proteins. Traditionally, a trial-and-error top-down approach, which requires extensive costs in screening analyses, has been carried out to produce emulsifying peptides. This review presents the recent advances in a novel and fundamentally orthogonal bottom-up strategy, facilitated by quantitative proteomics and bioinformatic functional prediction, to produce emulsifying peptides by targeted enzymatic hydrolysis based on in silico proteolysis. Moreover, new insights on the relation between interfacial properties of peptides and emulsifying activity, as well as impact on stability of wet and dried emulsions, are discussed.
Over a 100 years ago, William Bateson provided, through his observations of the transmission of alkaptonuria in first cousin offspring, evidence of the application of Mendelian genetics to certain human traits and diseases. His work was corroborated by Archibald Garrod (Archibald AE. The incidence of alkaptonuria: a study in chemical individuality. Lancert 1902;ii:1616-20) and William Farabee (Farabee WC. Inheritance of digital malformations in man. In: Papers of the Peabody Museum of American Archaeology and Ethnology. Cambridge, Mass: Harvard University, 1905; 65-78), who recorded the familial tendencies of inheritance of malformations of human hands and feet. These were the pioneers of the hunt for disease genes that would continue through the century and result in the discovery of hundreds of genes that can be associated with different diseases. Despite many ground-breaking discoveries during the last century, we are far from having a complete understanding of the intricate network of molecular processes involved in diseases, and we are still searching for the cures for most complex diseases. In the last few years, new genome sequencing and other high-throughput experimental techniques have generated vast amounts of molecular and clinical data that contain crucial information with the potential of leading to the next major biomedical discoveries. The need to mine, visualize and integrate these data has motivated the development of several informatics approaches that can broadly be grouped in the research area of 'translational bioinformatics'. This review highlights the latest advances in the field of translational bioinformatics, focusing on the advances of computational techniques to search for and classify disease genes.
暂无摘要(点击查看原文获取完整内容)
Proteomics is the study of proteins on a genome-wide scale. Within the wide field of functional OMICS, proteomics has become a useful tool. The completion of genome sequencing projects and the improvement of methods for protein characterization surges this action forward. Presently, the usage of proteomics is being extended to analyze the different features of proteins including the activities and structures, and protein-protein interactions. Proteomics research is quite advanced in animals, yeast and bacteria, but it is still in the beginning stages of plant research, due to its highly complex and dynamic status. In view of the advances in crop biotechnology, it is critical to understand the role of proteins during plant development and response to biotic and abiotic stimuli. In this review, we presented several plant proteomic studies to illustrate the applications of proteomic studies in crop productivity. The advances in proteomics in recent years include protein isolation methods, mass spectrometry, protein-protein interactions and post translational modifications. We further discuss the strengths and weaknesses of proteomic technologies and the limitations of current techniques in the perspective of plant biology. We conclude that advances in protein interactions and bioinformatics will have an increasing impact on better understanding the various functional aspects in plants, such as PTM, subcellular localization and protein interactions.
All living organisms require metal ions for their energy production and metabolic and biosynthetic processes. Within cells, the metal ions involved in the formation of adducts interact with metabolites and macromolecules (proteins and nucleic acids). The proteins that require binding to one or more metal ions in order to be able to carry out their physiological function are called metalloproteins. About one third of all protein structures in the Protein Data Bank involve metalloproteins. Over the past few years there has been tremendous progress in the number of computational tools and techniques making use of 3D structural information to support the investigation of metalloproteins. This trend has been boosted by the successful applications of neural networks and machine/deep learning approaches in molecular and structural biology at large. In this review, we discuss recent advances in the development and availability of resources dealing with metalloproteins from a structure-based perspective. We start by addressing tools for the prediction of metal-binding sites (MBSs) using structural information on apo-proteins. Then, we provide an overview of the methods for and lessons learned from the structural comparison of MBSs in a fold-independent manner. We then move to describing databases of metalloprotein/MBS structures. Finally, we summarizing recent ML/DL applications enhancing the functional interpretation of metalloprotein structures.
With the global human population growing rapidly, agricultural production must increase to meet crop demand. Improving crops through breeding is a sustainable approach to increase yield and yield stability without intensifying the use of fertilisers and pesticides. Current advances in genomics and bioinformatics provide opportunities for accelerating crop improvement. The rise of third generation sequencing technologies is helping overcome challenges in plant genome assembly caused by polyploidy and frequent repetitive elements. As a result, high-quality crop reference genomes are increasingly available, benefitting downstream analyses such as variant calling and association mapping that identify breeding targets in the genome. Machine learning also helps identify genomic regions of agronomic value by facilitating functional annotation of genomes and enabling real-time high-throughput phenotyping of agronomic traits in the glasshouse and in the field. Furthermore, crop databases that integrate the growing volume of genotype and phenotype data provide a valuable resource for breeders and an opportunity for data mining approaches to uncover novel trait-associated candidate genes. As knowledge of crop genetics expands, genomic selection and genome editing hold promise for breeding diseases-resistant and stress-tolerant crops with high yields.
暂无摘要(点击查看原文获取完整内容)
Extracting inherent valuable knowledge from omics big data remains as a daunting problem in bioinformatics and computational biology. Deep learning, as an emerging branch from machine learning, has exhibited unprecedented performance in quite a few applications from academia and industry. We highlight the difference and similarity in widely utilized models in deep learning studies, through discussing their basic structures, and reviewing diverse applications and disadvantages. We anticipate the work can serve as a meaningful perspective for further development of its theory, algorithm and application in bioinformatic and computational biology.
The translation of genomics data into actionable knowledge for use in healthcare is transforming the clinical landscape in an unprecedented way. Exciting and innovative models that bridge the gap between clinical and academic research are set to open up the field of translational bioinformatics for rapid growth in a digital era.
暂无摘要(点击查看原文获取完整内容)
暂无摘要(点击查看原文获取完整内容)
Big Data in life science is scattered across hundreds of unstructured data sets, biological databases and thousands of scientific journals. Modern crop research relies on high-throughput technologies that generate large quantities of high-dimensional data. The challenge for Applied Bioinformatics is to capture, model, integrate, analyze, visualize and make these data accessible in a FAIR way. This, in turn, translates directly to the improvement of our understanding of crop biology, and in practical terms results in the development of new elite genotypes and improvement of plant cultivation strategies.