Motivation: Developing high-performing bioinformatics models typically requires repeated cycles of hypothesis formulation, architectural redesign, and empirical validation, making progress slow, labor-intensive, and difficult to reproduce. Although recent LLM-based assistants can automate isolated steps, they lack performance-grounded reasoning and stability-aware mechanisms required for reliable, iterative model improvement in bioinformatics workflows. Results: We introduce MARBLE, an execution-stable autonomous model refinement framework for bioinformatics models. MARBLE couples literature-aware reference selection with structured, debate-driven architectural reasoning among role-specialized agents, followed by autonomous execution, evaluation, and memory updates explicitly grounded in empirical performance. Across spatial transcriptomics domain segmentation, drug-target interaction prediction, and drug response prediction, MARBLE consistently achieves sustained performance improvements over strong baselines across multiple refinement cycles, while maintaining high execution robustness and low regression rates. Framework-level analyses demonstrate that structured debate, balanced
This paper introduces BioAgent Bench, a benchmark dataset and an evaluation suite designed for measuring the performance and robustness of AI agents in common bioinformatics tasks. The benchmark contains curated end-to-end tasks (e.g., RNA-seq, variant calling, metagenomics) with prompts that specify concrete output artifacts to support automated assessment, including stress testing under controlled perturbations. We evaluate frontier closed-source and open-weight models across multiple agent harnesses, and use an LLM-based grader to score pipeline progress and outcome validity. We find that frontier agents can complete multi-step bioinformatics pipelines without elaborate custom scaffolding, often producing the requested final artifacts reliably. However, robustness tests reveal failure modes under controlled perturbations (corrupted inputs, decoy files, and prompt bloat), indicating that correct high-level pipeline construction does not guarantee reliable step-level reasoning. Finally, because bioinformatics workflows may involve sensitive patient data, proprietary references, or unpublished IP, closed-source models can be unsuitable under strict privacy constraints; in such sett
Bioinformatics tools are essential for complex computational biology tasks, yet their integration with emerging AI-agent frameworks is hindered by incompatible interfaces, heterogeneous input-output formats, and inconsistent parameter conventions. The Model Context Protocol (MCP) provides a standardized framework for tool-AI communication, but manually converting hundreds of existing and rapidly growing specialized bioinformatics tools into MCP-compliant servers is labor-intensive and unsustainable. Here, we present BioinfoMCP, a unified platform comprising two components: BioinfoMCP Converter, which automatically generates robust MCP servers from tool documentation using large language models, and BioinfoMCP Benchmark, which systematically validates the reliability and versatility of converted tools across diverse computational tasks. We present a platform of 38 MCP-converted bioinformatics tools, extensively validated to show that 94.7% successfully executed complex workflows across three widely used AI-agent platforms. By removing technical barriers to AI automation, BioinfoMCP enables natural-language interaction with sophisticated bioinformatics analyses without requiring exte
Creating end-to-end bioinformatics workflows requires diverse domain expertise, which poses challenges for both junior and senior researchers as it demands a deep understanding of both genomics concepts and computational techniques. While large language models (LLMs) provide some assistance, they often fall short in providing the nuanced guidance needed to execute complex bioinformatics tasks, and require expensive computing resources to achieve high performance. We thus propose a multi-agent system built on small language models, fine-tuned on bioinformatics data, and enhanced with retrieval augmented generation (RAG). Our system, BioAgents, enables local operation and personalization using proprietary data. We observe performance comparable to human experts on conceptual genomics tasks, and suggest next steps to enhance code generation capabilities.
Bioinformatics web servers are critical resources in modern biomedical research, facilitating interactive exploration of datasets through custom-built interfaces with rich visualization capabilities. However, this human-centric design limits machine readability for large language models (LLMs) and deep research agents. We address this gap by adapting the Model Context Protocol (MCP) to bioinformatics web server backends - a standardized, machine-actionable layer that explicitly associates webservice endpoints with scientific concepts and detailed metadata. Our implementations across widely-used databases (GEO, STRING, UCSC Cell Browser) demonstrate enhanced exploration capabilities through MCP-enabled LLMs. To accelerate adoption, we propose MCPmed, a community effort supplemented by lightweight breadcrumbs for services not yet fully MCP-enabled and templates for setting up new servers. This structured transition significantly enhances automation, reproducibility, and interoperability, preparing bioinformatics web services for next-generation research agents.
The integration of bioinformatics predictions and experimental validation plays a pivotal role in advancing biological research, from understanding molecular mechanisms to developing therapeutic strategies. Bioinformatics tools and methods offer powerful means for predicting gene functions, protein interactions, and regulatory networks, but these predictions must be validated through experimental approaches to ensure their biological relevance. This review explores the various methods and technologies used for experimental validation, including gene expression analysis, protein-protein interaction verification, and pathway validation. We also discuss the challenges involved in translating computational predictions to experimental settings and highlight the importance of collaboration between bioinformatics and experimental research. Finally, emerging technologies, such as CRISPR gene editing, next-generation sequencing, and artificial intelligence, are shaping the future of bioinformatics validation and driving more accurate and efficient biological discoveries.
Large Language Models (LLMs) are revolutionizing bioinformatics, enabling advanced analysis of DNA, RNA, proteins, and single-cell data. This survey provides a systematic review of recent advancements, focusing on genomic sequence modeling, RNA structure prediction, protein function inference, and single-cell transcriptomics. Meanwhile, we also discuss several key challenges, including data scarcity, computational complexity, and cross-omics integration, and explore future directions such as multimodal learning, hybrid AI models, and clinical applications. By offering a comprehensive perspective, this paper underscores the transformative potential of LLMs in driving innovations in bioinformatics and precision medicine.
The year 2023 marked a significant surge in the exploration of applying large language model (LLM) chatbots, notably ChatGPT, across various disciplines. We surveyed the applications of ChatGPT in bioinformatics and biomedical informatics throughout the year, covering omics, genetics, biomedical text mining, drug discovery, biomedical image understanding, bioinformatics programming, and bioinformatics education. Our survey delineates the current strengths and limitations of this chatbot in bioinformatics and offers insights into potential avenues for future developments.
Study reproducibility is essential to corroborate, build on, and learn from the results of scientific research but is notoriously challenging in bioinformatics, which often involves large data sets and complex analytic workflows involving many different tools. Additionally many biologists aren't trained in how to effectively record their bioinformatics analysis steps to ensure reproducibility, so critical information is often missing. Software tools used in bioinformatics can automate provenance tracking of the results they generate, removing most barriers to bioinformatics reproducibility. Here we present an implementation of that idea, Provenance Replay, a tool for generating new executable code from results generated with the QIIME 2 bioinformatics platform, and discuss considerations for bioinformatics developers who wish to implement similar functionality in their software.
Denoising diffusion models have emerged as one of the most powerful generative models in recent years. They have achieved remarkable success in many fields, such as computer vision, natural language processing (NLP), and bioinformatics. Although there are a few excellent reviews on diffusion models and their applications in computer vision and NLP, there is a lack of an overview of their applications in bioinformatics. This review aims to provide a rather thorough overview of the applications of diffusion models in bioinformatics to aid their further development in bioinformatics and computational biology. We start with an introduction of the key concepts and theoretical foundations of three cornerstone diffusion modeling frameworks (denoising diffusion probabilistic models, noise-conditioned scoring networks, and stochastic differential equations), followed by a comprehensive description of diffusion models employed in the different domains of bioinformatics, including cryo-EM data enhancement, single-cell data analysis, protein design and generation, drug and small molecule design, and protein-ligand interaction. The review is concluded with a summary of the potential new develop
Large language models (LLMs) such as ChatGPT have gained considerable interest across diverse research communities. Their notable ability for text completion and generation has inaugurated a novel paradigm for language-interfaced problem solving. However, the potential and efficacy of these models in bioinformatics remain incompletely explored. In this work, we study the performance LLMs on a wide spectrum of crucial bioinformatics tasks. These tasks include the identification of potential coding regions, extraction of named entities for genes and proteins, detection of antimicrobial and anti-cancer peptides, molecular optimization, and resolution of educational bioinformatics problems. Our findings indicate that, given appropriate prompts, LLMs like GPT variants can successfully handle most of these tasks. In addition, we provide a thorough analysis of their limitations in the context of complicated bioinformatics tasks. In conclusion, we believe that this work can provide new perspectives and motivate future research in the field of LLMs applications, AI for Science and bioinformatics.
Bioinformatics has witnessed a paradigm shift with the increasing integration of artificial intelligence (AI), particularly through the adoption of foundation models (FMs). These AI techniques have rapidly advanced, addressing historical challenges in bioinformatics such as the scarcity of annotated data and the presence of data noise. FMs are particularly adept at handling large-scale, unlabeled data, a common scenario in biological contexts due to the time-consuming and costly nature of experimentally determining labeled data. This characteristic has allowed FMs to excel and achieve notable results in various downstream validation tasks, demonstrating their ability to represent diverse biological entities effectively. Undoubtedly, FMs have ushered in a new era in computational biology, especially in the realm of deep learning. The primary goal of this survey is to conduct a systematic investigation and summary of FMs in bioinformatics, tracing their evolution, current research status, and the methodologies employed. Central to our focus is the application of FMs to specific biological problems, aiming to guide the research community in choosing appropriate FMs for their research
Background, enhancing interoperability of bioinformatics knowledge bases is a high priority requirement to maximize data reusability, and thus increase their utility such as the return on investment for biomedical research. A knowledge base may provide useful information for life scientists and other knowledge bases, but it only acquires exchange value once the knowledge base is (re)used, and without interoperability the utility lies dormant. Results, in this article, we discuss several approaches to boost interoperability depending on the interoperable parts. The findings are driven by several real-world scenario examples that were mostly implemented by Bgee, a well-established gene expression database. To better justify the findings are transferable, for each Bgee interoperability experience, we also highlight similar implementations by major bioinformatics knowledge bases. Moreover, we discuss ten general main lessons learnt. These lessons can be applied in the context of any bioinformatics knowledge base to foster data reusability. Conclusions, this work provides pragmatic methods and transferable skills to promote reusability of bioinformatics knowledge bases by focusing on in
While many good textbooks are available on Protein Structure, Molecular Simulations, Thermodynamics and Bioinformatics methods in general, there is no good introductory level book for the field of Structural Bioinformatics. This book aims to give an introduction into Structural Bioinformatics, which is where the previous topics meet to explore three dimensional protein structures through computational analysis. We provide an overview of existing computational techniques, to validate, simulate, predict and analyse protein structures. More importantly, it will aim to provide practical knowledge about how and when to use such techniques. We will consider proteins from three major vantage points: Protein structure quantification, Protein structure prediction, and Protein simulation & dynamics. Structural bioinformatics involves a variety of computational methods, all of which require input data. Typical inputs include protein structures and sequences, which are usually retrieved from a public or private database. This chapter introduces several key resources that make such data available, as well as a handful of tools that derive additional information from experimentally determine
Pre-trained large language models (LLMs) have significantly improved code generation. As these models scale up, there is an increasing need for the output to handle more intricate tasks and to be appropriately specialized to particular domains. Here, we target bioinformatics due to the amount of domain knowledge, algorithms, and data operations this discipline requires. We present BioCoder, a benchmark developed to evaluate LLMs in generating bioinformatics-specific code. BioCoder spans much of the field, covering cross-file dependencies, class declarations, and global variables. It incorporates 1,026 Python functions and 1,243 Java methods extracted from GitHub, along with 253 examples from the Rosalind Project, all pertaining to bioinformatics. Using topic modeling, we show that the overall coverage of the included code is representative of the full spectrum of bioinformatics calculations. BioCoder incorporates a fuzz-testing framework for evaluation. We have applied it to evaluate various models including InCoder, CodeGen, CodeGen2, SantaCoder, StarCoder, StarCoder+, InstructCodeT5+, GPT-3.5, and GPT- 4. Furthermore, we fine-tuned one model (StarCoder), demonstrating that our tr
Artificial intelligence (AI), particularly machine learning and deep learning models, has significantly impacted bioinformatics research by offering powerful tools for analyzing complex biological data. However, the lack of interpretability and transparency of these models presents challenges in leveraging these models for deeper biological insights and for generating testable hypotheses. Explainable AI (XAI) has emerged as a promising solution to enhance the transparency and interpretability of AI models in bioinformatics. This review provides a comprehensive analysis of various XAI techniques and their applications across various bioinformatics domains including DNA, RNA, and protein sequence analysis, structural analysis, gene expression and genome analysis, and bioimaging analysis. We introduce the most pertinent machine learning and XAI methods, then discuss their diverse applications and address the current limitations of available XAI tools. By offering insights into XAI's potential and challenges, this review aims to facilitate its practical implementation in bioinformatics research and help researchers navigate the landscape of XAI tools.
In biomedical research, validation of a new scientific discovery is tied to the reproducibility of its experimental results. However, in genomics, the definition and implementation of reproducibility still remain imprecise. Here, we argue that genomic reproducibility, defined as the ability of bioinformatics tools to maintain consistent genomics results across technical replicates, is key to generating scientific knowledge and enabling medical applications. We first discuss different concepts of reproducibility and then focus on reproducibility in the context of genomics, aiming to establish clear definitions of relevant terms. We then focus on the role of bioinformatics tools and their impact on genomic reproducibility and assess methods of evaluating bioinformatics tools in terms of genomic reproducibility. Lastly, we suggest best practices for enhancing genomic reproducibility, with an emphasis on assessing the performance of bioinformatics tools through rigorous testing across multiple technical replicates.
Bioinformatics platforms have significantly changed clinical diagnostics by facilitating the analysis of genomic data, thereby advancing personalized medicine and improving patient care. This study examines the integration, usage patterns, challenges, and impact of the Galaxy platform within clinical diagnostics laboratories. We employed a convergent parallel mixed-methods design, collecting quantitative survey data and qualitative insights from structured interviews with fifteen participants across various clinical roles. The findings indicate a wide adoption of Galaxy, with participants expressing high satisfaction due to its user-friendly interface and notable improvements in workflow efficiency and diagnostic accuracy. Challenges such as data security and training needs were also identified, highlighting the platform's role in simplifying complex data analysis tasks. This study contributes to understanding the transformative potential of Galaxy in clinical practice and offers recommendations for optimizing its integration and functionality. These insights are crucial for advancing clinical diagnostics and enhancing patient outcomes.
There is a wide range of available biological databases developed by bioinformatics experts, employing different methods to extract biological data. In this paper, we investigate and evaluate the performance of some of these methods in terms of their ability to efficiently access bioinformatics databases using web-based interfaces. These methods retrieve bioinformatics information using structured and semi-structured data tools, which are able to retrieve data from remote database servers. This study distinguishes each of these approaches and contrasts these tools. We used Sequence Retrieval System (SRS) and Entrez search tools for structured data, while Perl and BioPerl search programs were used for semi-structured data to retrieve complex queries including a combination of text and numeric information. The study concludes that the use of semi-structured data tools for accessing bioinformatics databases is a viable alternative to the structured tools, though each method is shown to have certain inherent advantages and disadvantages.
Documentation is one of the most neglected activities in Software Engineering, although it is an important method of assuring quality and understanding. Bioinformatics software is generally written by researchers from fields other than Computer Science who usually do not provide documentation. Documenting bioinformatics software may ease its adoption in multidisciplinary teams and expand its impact on the community. In this paper, we highlight how one can document software that is already finished, using reverse engineering and thinking of the end-user.