Artificial intelligence has advanced rapidly in biomedicine through large-scale multimodal data integration, enabling increasingly accurate prediction of clinical outcomes and patient stratification. These systems, however, remain fundamentally observational: they learn statistical associations from historical data and operate within previously observed biological and clinical states, limiting their ability to generalize to novel therapies or unobserved interventions. We argue that AI in biomedicine is undergoing a structural transition. As biomedical decision-making increasingly depends on reasoning about intervention rather than extrapolation from past observations, predictive architectures become structurally insufficient. Systems that learn from historical data cannot, by construction, represent how biological systems evolve under perturbation, and therefore cannot reliably support decision-making in the presence of novel interventions. We introduce a conceptual framework distinguishing observational and interventional intelligence and define disease-level models as systems that explicitly represent the state, dynamics, and intervention response of biological processes. These m
Recent development in Retrieval-Augmented Large Language Models (LLMs) have shown great promise in biomedical applications. How ever, a critical gap persists in reliably evaluating their curation ability the process by which models select and integrate relevant references while filtering out noise. To address this, we introduce the benchmark for Curation of Retrieval-Augmented LLMs in Biomedicine (CRAB), the first multilingual benchmark tailored for evaluating the biomedical curation of retrieval-augmented LLMs, available in English, French, German and Chinese. By incorporating a novel citation-based evaluation metric, CRAB quantifies the curation performance of retrieval-augmented LLMs in biomedicine. Experimental results reveal significant discrepancies in the curation performance of mainstream LLMs, underscoring the urgent need to improve it in the domain of biomedicine. Our dataset is available at https://huggingface.co/datasets/zhm0/CRAB.
Neuromorphic engineering has matured over the past four decades and is currently experiencing explosive growth with the potential to transform biomedical engineering and neurotechnologies. Participants at the Neuromorphic Principles in Biomedicine and Healthcare (NPBH) Workshop (October 2024) -- representing a broad cross-section of the community, including early-career and established scholars, engineers, scientists, clinicians, industry, and funders -- convened to discuss the state of the field, current and future challenges, and strategies for advancing neuromorphic research and development for biomedical applications. Publicly approved recordings with transcripts (https://2024.neuro-med.org/program/session-video-and-transcripts) and slides (https://2024.neuro-med.org/program/session-slides) can be found at the workshop website.
Magnetotactic bacteria synthesize linear chains of magnetite nanoparticles within their bodies, which allow the bacteria to navigate the Earth's magnetic field in search of the best habitat. Biogenic magnetite particles, called magnetosomes, are very promising for use in biomedicine. Magnetosome chains have also been found in ancient fossils and sediments. The study of magnetofossils provides valuable information about the Earth's biological past. The presence of biogenic magnetite in ancient rock samples can be detected by measuring ferromagnetic resonance spectra, first-order magnetization reversal curves, or quasi-static hysteresis loops. Theoretical analyses of these experiments generally assume that magnetosomes are spherical nanoparticles, although the shape of some types of magnetosomes is close to spheroidal one. In this work, simple formulas for describing the magneto-dipole interaction of oriented spheroids are obtained and quasi-static hysteresis loops of randomly oriented magnetosome chain assembly consisting of elongated spheroids are calculated.
Large language models (LLMs) in biomedicine face a fundamental conflict between static parameter knowledge and the dynamic nature of clinical evidence. Retrieval-Augmented Generation (RAG) addresses this by grounding generation in external data, yet it introduces new complexities in latency and architecture. This survey synthesizes the biomedical RAG landscape (2020-2025), classifying systems into naive, advanced, and modular paradigms. Beyond a technological taxonomy, we formalize the biomedical RAG trilemma, identifying the inherent trade-offs between reasoning depth, inference latency, and data privacy that constrain current clinical deployment. We analyze how recent agentic workflows enhance diagnostic reasoning but risk prohibitive latency, and how privacy constraints dictate the choice between powerful cloud-based models and local deployment. Finally, we outline the alignment gap in multimodal RAG and propose future directions for self-correcting, verifiable clinical agents.
Large Language Models (LLMs) show promise in biomedicine but lack true causal understanding, relying instead on correlations. This paper envisions causal LLM agents that integrate multimodal data (text, images, genomics, etc.) and perform intervention-based reasoning to infer cause-and-effect. Addressing this requires overcoming key challenges: designing safe, controllable agentic frameworks; developing rigorous benchmarks for causal evaluation; integrating heterogeneous data sources; and synergistically combining LLMs with structured knowledge (KGs) and formal causal inference tools. Such agents could unlock transformative opportunities, including accelerating drug discovery through automated hypothesis generation and simulation, enabling personalized medicine through patient-specific causal models. This research agenda aims to foster interdisciplinary efforts, bridging causal concepts and foundation models to develop reliable AI partners for biomedical progress.
Traditional Chinese Medicine diagnosis and treatment principles, established through centuries of trial-and-error clinical practice, directly maps patient-specific symptom patterns to personalised herbal therapies. These empirical holistic mapping principles offer valuable strategies to address remaining challenges of reductionism methodologies in modern biomedicine. However, the lack of a quantitative framework and molecular-level evidence has limited their interpretability and reliability. Here, we present an AI framework trained on ancient and classical TCM formula records to quantify the symptom pattern-herbal therapy mappings. Interestingly, we find that empirical TCM diagnosis and treatment are consistent with the encoding-decoding processes in the AI model. This enables us to construct an interpretable TCM embedding space (TCM-ES) using the model's quantitative representation of TCM principles. Validated through broad and extensive TCM patient data, the TCM-ES offers universal quantification of the TCM practice and therapeutic efficacy. We further map biomedical entities into the TCM-ES through correspondence alignment. We find that the principal directions of the TCM-ES are
Foundation models, first introduced in 2021, refer to large-scale pretrained models (e.g., large language models (LLMs) and vision-language models (VLMs)) that learn from extensive unlabeled datasets through unsupervised methods, enabling them to excel in diverse downstream tasks. These models, like GPT, can be adapted to various applications such as question answering and visual understanding, outperforming task-specific AI models and earning their name due to broad applicability across fields. The development of biomedical foundation models marks a significant milestone in the use of artificial intelligence (AI) to understand complex biological phenomena and advance medical research and practice. This survey explores the potential of foundation models in diverse domains within biomedical fields, including computational biology, drug discovery and development, clinical informatics, medical imaging, and public health. The purpose of this survey is to inspire ongoing research in the application of foundation models to health science.
Recent breakthroughs in large language models (LLMs) offer unprecedented natural language understanding and generation capabilities. However, existing surveys on LLMs in biomedicine often focus on specific applications or model architectures, lacking a comprehensive analysis that integrates the latest advancements across various biomedical domains. This review, based on an analysis of 484 publications sourced from databases including PubMed, Web of Science, and arXiv, provides an in-depth examination of the current landscape, applications, challenges, and prospects of LLMs in biomedicine, distinguishing itself by focusing on the practical implications of these models in real-world biomedical contexts. Firstly, we explore the capabilities of LLMs in zero-shot learning across a broad spectrum of biomedical tasks, including diagnostic assistance, drug discovery, and personalized medicine, among others, with insights drawn from 137 key studies. Then, we discuss adaptation strategies of LLMs, including fine-tuning methods for both uni-modal and multi-modal LLMs to enhance their performance in specialized biomedical contexts where zero-shot fails to achieve, such as medical question answ
Research resources (RRs) such as data, software, and tools are essential pillars of scientific research. The field of biomedicine, a critical scientific discipline, is witnessing a surge in research publications resulting in the accumulation of a substantial number of RRs. However, these resources are dispersed among various biomedical articles and can be challenging to locate and reuse due to their transient nature. In this paper, we report our recent progress in biomedical data curation - building a large research resource database for biomedicine (RRD-Bio), based on a collection of 40 million papers from two large biomedical literature databases, PubMed and PubMed Central. The database contains 2,555,116 RRs, each identified by a location on the Internet (URL) and descriptive information (Context). We made the RRD-Bio database publicly available (\url{https://zenodo.org/records/10526493}) to enhance the visibility of biomedical research resources, the ability to preserve important resources and the reproducibility of biomedical research.
ChatGPT has drawn considerable attention from both the general public and domain experts with its remarkable text generation capabilities. This has subsequently led to the emergence of diverse applications in the field of biomedicine and health. In this work, we examine the diverse applications of large language models (LLMs), such as ChatGPT, in biomedicine and health. Specifically we explore the areas of biomedical information retrieval, question answering, medical text summarization, information extraction, and medical education, and investigate whether LLMs possess the transformative power to revolutionize these tasks or whether the distinct complexities of biomedical domain presents unique challenges. Following an extensive literature survey, we find that significant advances have been made in the field of text generation tasks, surpassing the previous state-of-the-art methods. For other applications, the advances have been modest. Overall, LLMs have not yet revolutionized biomedicine, but recent rapid progress indicates that such methods hold great potential to provide valuable means for accelerating discovery and improving health. We also find that the use of LLMs, like Chat
The capabilities of AI for biomedicine span a wide spectrum, from the atomic level, where it solves partial differential equations for quantum systems, to the molecular level, predicting chemical or protein structures, and further extending to societal predictions like infectious disease outbreaks. Recent advancements in large language models, exemplified by models like ChatGPT, have showcased significant prowess in natural language tasks, such as translating languages, constructing chatbots, and answering questions. When we consider biomedical data, we observe a resemblance to natural language in terms of sequences: biomedical literature and health records presented as text, biological sequences or sequencing data arranged in sequences, or sensor data like brain signals as time series. The question arises: Can we harness the potential of recent large language models to drive biomedical knowledge discoveries? In this survey, we will explore the application of large language models to three crucial categories of biomedical data: 1) textual data, 2) biological sequences, and 3) brain signals. Furthermore, we will delve into large language model challenges in biomedical research, incl
Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains and are moving towards more specialized areas. Recent advanced proprietary models such as GPT-4 and Gemini have achieved significant advancements in biomedicine, which have also raised privacy and security challenges. The construction of specialized generalists hinges largely on high-quality datasets, enhanced by techniques like supervised fine-tuning and reinforcement learning from human or AI feedback, and direct preference optimization. However, these leading technologies (e.g., preference learning) are still significantly limited in the open source community due to the scarcity of specialized data. In this paper, we present the UltraMedical collections, which consist of high-quality manual and synthetic datasets in the biomedicine domain, featuring preference annotations across multiple advanced LLMs. By utilizing these datasets, we fine-tune a suite of specialized medical models based on Llama-3 series, demonstrating breathtaking capabilities across various medical benchmarks. Moreover, we develop powerful reward models skilled in biomedical and general reward benchmark, enhancing fur
Foundation models (FMs) have exhibited remarkable performance across a wide range of downstream tasks in many domains. Nevertheless, general-purpose FMs often face challenges when confronted with domain-specific problems, due to their limited access to the proprietary training data in a particular domain. In biomedicine, there are various biological modalities, such as molecules, proteins, and cells, which are encoded by the language of life and exhibit significant modality gaps with human natural language. In this paper, we introduce BioMedGPT, an open multimodal generative pre-trained transformer (GPT) for biomedicine, to bridge the gap between the language of life and human natural language. BioMedGPT allows users to easily ``communicate'' with diverse biological modalities through free text, which is the first of its kind. BioMedGPT aligns different biological modalities with natural language via a large generative language model, namely, BioMedGPT-LM. We publish BioMedGPT-10B, which unifies the feature spaces of molecules, proteins, and natural language via encoding and alignment. Through fine-tuning, BioMedGPT-10B outperforms or is on par with human and significantly larger g
Multi-modal learning has significantly advanced generative AI, especially in vision-language modeling. Innovations like GPT-4V and open-source projects such as LLaVA have enabled robust conversational agents capable of zero-shot task completions. However, applying these technologies in the biomedical field presents unique challenges. Recent initiatives like LLaVA-Med have started to adapt instruction-tuning for biomedical contexts using large datasets such as PMC-15M. Our research offers three key contributions: (i) we present a new instruct dataset enriched with medical image-text pairs from Claude3-Opus and LLaMA3 70B, (ii) we propose a novel image encoding strategy using hierarchical representations to improve fine-grained biomedical visual comprehension, and (iii) we develop the Llama3-Med model, which achieves state-of-the-art zero-shot performance on biomedical visual question answering benchmarks, with an average performance improvement of over 10% compared to previous methods. These advancements provide more accurate and reliable tools for medical professionals, bridging gaps in current multi-modal conversational assistants and promoting further innovations in medical AI.
Artificial intelligence (AI) systems have the potential to revolutionize clinical practices, including improving diagnostic accuracy and surgical decision-making, while also reducing costs and manpower. However, it is important to recognize that these systems may perpetuate social inequities or demonstrate biases, such as those based on race or gender. Such biases can occur before, during, or after the development of AI models, making it critical to understand and address potential biases to enable the accurate and reliable application of AI models in clinical settings. To mitigate bias concerns during model development, we surveyed recent publications on different debiasing methods in the fields of biomedical natural language processing (NLP) or computer vision (CV). Then we discussed the methods that have been applied in the biomedical domain to address bias. We performed our literature search on PubMed, ACM digital library, and IEEE Xplore of relevant articles published between January 2018 and December 2023 using multiple combinations of keywords. We then filtered the result of 10,041 articles automatically with loose constraints, and manually inspected the abstracts of the rem
With the rapid development of multimodal large language models (MLLMs), especially their capabilities in visual chat through refer and ground functionalities, their significance is increasingly recognized. However, the biomedical field currently exhibits a substantial gap in this area, primarily due to the absence of a dedicated refer and ground dataset for biomedical images. To address this challenge, we devised the Med-GRIT-270k dataset. It comprises 270k question-and-answer pairs and spans eight distinct medical imaging modalities. Most importantly, it is the first dedicated to the biomedical domain and integrating refer and ground conversations. The key idea is to sample large-scale biomedical image-mask pairs from medical segmentation datasets and generate instruction datasets from text using chatGPT. Additionally, we introduce a Refer-and-Ground Multimodal Large Language Model for Biomedicine (BiRD) by using this dataset and multi-task instruction learning. Extensive experiments have corroborated the efficacy of the Med-GRIT-270k dataset and the multi-modal, fine-grained interactive capabilities of the BiRD model. This holds significant reference value for the exploration and
Large Language Models (LLMs) have swiftly emerged as vital resources for different applications in the biomedical and healthcare domains; however, these models encounter issues such as generating inaccurate information or hallucinations. Retrieval-augmented generation provided a solution for these models to update knowledge and enhance their performance. In contrast to previous retrieval-augmented LMs, which utilize specialized cross-attention mechanisms to help LLM encode retrieved text, BiomedRAG adopts a simpler approach by directly inputting the retrieved chunk-based documents into the LLM. This straightforward design is easily applicable to existing retrieval and language models, effectively bypassing noise information in retrieved documents, particularly in noise-intensive tasks. Moreover, we demonstrate the potential for utilizing the LLM to supervise the retrieval model in the biomedical domain, enabling it to retrieve the document that assists the LM in improving its predictions. Our experiments reveal that with the tuned scorer,\textsc{ BiomedRAG} attains superior performance across 5 biomedical NLP tasks, encompassing information extraction (triple extraction, relation e
Computational models and simulations are not just appealing because of their intrinsic characteristics across spatiotemporal scales, scalability, and predictive power, but also because the set of problems in cancer biomedicine that can be addressed computationally exceeds the set of those amenable to analytical solutions. Agent-based models and simulations are especially interesting candidates among computational modelling strategies in cancer research due to their capabilities to replicate realistic local and global interaction dynamics at a convenient and relevant scale. Yet, the absence of methods to validate the consistency of the results across scales can hinder adoption by turning fine-tuned models into black boxes. This review compiles relevant literature to explore strategies to leverage high-fidelity simulations of multi-scale, or multi-level, cancer models with a focus on validation approached as simulation calibration. We argue that simulation calibration goes beyond parameter optimization by embedding informative priors to generate plausible parameter configurations across multiple dimensions.
Graph representation learning (GRL) has emerged as a pivotal field that has contributed significantly to breakthroughs in various fields, including biomedicine. The objective of this survey is to review the latest advancements in GRL methods and their applications in the biomedical field. We also highlight key challenges currently faced by GRL and outline potential directions for future research.