Thanks to the rapidly evolving integration of LLMs into decision-support tools, a significant transformation is happening across large-scale systems. Like other medical fields, the use of LLMs such as GPT-4 is gaining increasing interest in radiation oncology as well. An attempt to assess GPT-4's performance in radiation oncology was made via a dedicated 100-question examination on the highly specialized topic of radiation oncology physics, revealing GPT-4's superiority over other LLMs. GPT-4's performance on a broader field of clinical radiation oncology is further benchmarked by the ACR Radiation Oncology In-Training (TXIT) exam where GPT-4 achieved a high accuracy of 74.57%. Its performance on re-labelling structure names in accordance with the AAPM TG-263 report has also been benchmarked, achieving above 96% accuracies. Such studies shed light on the potential of LLMs in radiation oncology. As interest in the potential and constraints of LLMs in general healthcare applications continues to rise5, the capabilities and limitations of LLMs in radiation oncology decision support have not yet been fully explored.
Summary: VTX is a molecular visualization software capable to handle most molecular structures and dynamics trajectories file formats. It features a real-time high-performance molecular graphics engine, based on modern OpenGL, optimized for the visualization of massive molecular systems and molecular dynamics trajectories. VTX includes multiple interactive camera and user interaction features, notably free-fly navigation and a fully modular graphical user interface designed for increased usability. It allows the production of high-resolution images for presentations and posters with custom background. VTX design is focused on performance and usability for research, teaching and educative purposes. Availability and implementation: VTX is open source and free for non commercial use. Builds for Windows and Ubuntu Linux are available at http://vtx.drugdesign.fr. The source code is available at https://github.com/VTX-Molecular-Visualization . Supplementary Information: A video displaying free-fly navigation in a whole-cell model is available
Mathematical oncology is an interdisciplinary research field where the mathematical sciences meet cancer research. Being situated at the intersection of these two fields makes mathematical oncology highly dynamic, as practicing researchers are incentivised to quickly adapt to both technical and medical research advances. Determining the scope of mathematical oncology is therefore not straightforward; however, it is important for purposes related to funding allocation, education, scientific communication, and community organisation. To address this issue, we here conduct a bibliometric analysis of mathematical oncology. We compare our results to the broader field of mathematical biology, and position our findings within theoretical science of science frameworks. Based on article metadata and citation flows, our results provide evidence that mathematical oncology has undergone a significant evolution since the 1960s marked by increased interactions with other disciplines, geographical expansion, larger research teams, and greater diversity in studied topics. The latter finding contributes to the greater discussion on which models different research communities consider to be valuable
Personalized oncology aims to tailor treatment strategies to the unique molecular and clinical profiles of individual patients, moving beyond the traditional paradigm of treating the disease not the patient. Achieving this vision requires the integration and interpretation of vast, heterogeneous biomedical data within a meaningful scientific framework. Knowledge graphs, structured according to biomedical ontologies, offer a powerful approach to contextualize and interconnect diverse datasets, enabling more precise and informed clinical decision-making. We present ECKO (Explainable Clinical Knowledge for Oncology), a comprehensive knowledge graph that integrates 33 biomedical ontologies and aggregates data from multiple studies to create a unified resource optimized for data-driven clinical applications in oncology. Designed to support personalized drug recommendations, ECKO facilitates the identification of optimal therapeutic options by linking patient-specific molecular data to relevant pharmacological knowledge. It provides transparent, interpretable explanations for drug recommendations, fostering greater trust and understanding among clinicians and researchers. This resource r
AI-assisted molecular property prediction has become a promising technique in early-stage drug discovery and materials design in recent years. However, due to high-cost and complex wet-lab experiments, real-world molecules usually experience the issue of scarce annotations, leading to limited labeled data for effective supervised AI model learning. In light of this, few-shot molecular property prediction (FSMPP) has emerged as an expressive paradigm that enables learning from only a few labeled examples. Despite rapidly growing attention, existing FSMPP studies remain fragmented, without a coherent framework to capture methodological advances and domain-specific challenges. In this work, we present the first comprehensive and systematic survey of few-shot molecular property prediction. We begin by analyzing the few-shot phenomenon in molecular datasets and highlighting two core challenges: (1) cross-property generalization under distribution shifts, where each task corresponding to each property, may follow a different data distribution or even be inherently weakly related to others from a biochemical perspective, requiring the model to transfer knowledge across heterogeneous predi
Molecular communication (MC) provides a foundational framework for information transmission in the Internet of Bio-Nano Things (IoBNT), where efficiency and reliability are crucial. However, the inherent limitations of molecular channels, such as low transmission rates, noise, and intersymbol interference (ISI), limit their ability to support complex data transmission. This paper proposes an end-to-end semantic learning framework designed to optimize task-oriented molecular communication, with a focus on biomedical diagnostic tasks under resource-constrained conditions. The proposed framework employs a deep encoder-decoder architecture to efficiently extract, quantize, and decode semantic features, prioritizing taskrelevant semantic information to enhance diagnostic classification performance. Additionally, a probabilistic channel network is introduced to approximate molecular propagation dynamics, enabling gradient-based optimization for end-to-end learning. Experimental results demonstrate that the proposed semantic framework improves diagnostic accuracy by at least 25% compared to conventional JPEG compression with LDPC coding methods under resource-constrained communication sce
Foundation models are reshaping computational pathology by enabling transfer learning, where models pre-trained on vast datasets can be adapted for downstream diagnostic, prognostic, and therapeutic response tasks. Despite these advances, foundation models are still limited in their ability to encode the entire gigapixel whole-slide images without additional training and often lack complementary multimodal data. Here, we introduce Threads, a slide-level foundation model capable of generating universal representations of whole-slide images of any size. Threads was pre-trained using a multimodal learning approach on a diverse cohort of 47,171 hematoxylin and eosin (H&E)-stained tissue sections, paired with corresponding genomic and transcriptomic profiles - the largest such paired dataset to be used for foundation model development to date. This unique training paradigm enables Threads to capture the tissue's underlying molecular composition, yielding powerful representations applicable to a wide array of downstream tasks. In extensive benchmarking across 54 oncology tasks, including clinical subtyping, grading, mutation prediction, immunohistochemistry status determination, trea
We present the Radiation Oncology NLP Database (ROND), the first dedicated Natural Language Processing (NLP) dataset for radiation oncology, an important medical specialty that has received limited attention from the NLP community in the past. With the advent of Artificial General Intelligence (AGI), there is an increasing need for specialized datasets and benchmarks to facilitate research and development. ROND is specifically designed to address this gap in the domain of radiation oncology, a field that offers many opportunities for NLP exploration. It encompasses various NLP tasks including Logic Reasoning, Text Classification, Named Entity Recognition (NER), Question Answering (QA), Text Summarization, and Patient-Clinician Conversations, each with a distinct focus on radiation oncology concepts and application cases. In addition, we have developed an instruction-tuning dataset consisting of over 20k instruction pairs (based on ROND) and trained a large language model, CancerChat. This serves to demonstrate the potential of instruction-tuning large language models within a highly-specialized medical domain. The evaluation results in this study could serve as baseline results for
Cancer is a complex genetic disease involving uncontrolled cell growth and proliferation, and necessitates effective targeting of dysregulated cellular pathways underlying cancer progression. Multiple genetic and epigenetic alterations characterize tumor progression and define hallmarks of cancer. Importantly, patients with the same cancer type respond differently to available cancer treatments, likely due to tumor-specific DNA, RNA, and proteins, indicating the need for patient-specific treatment options. Precision oncology has evolved as a form of cancer therapy that is focused on genetic and molecular profiling of tumors to identify specific molecular alterations involved in carcinogenesis for tailored individualized cancer treatment. Advances in high-throughput sequencing technologies have enabled gene expression profiling, providing multiomics data for detailed molecular characterization of various tumors. Integration and analysis of various multiomic sequencing data are crucial in this regard, as they can reveal critical molecular changes, such as cancer-driving mutations, post-translational modifications, gene fusions, amplifications, and alterations in signaling networks wi
The emergence of artificial general intelligence (AGI) is transforming radiation oncology. As prominent vanguards of AGI, large language models (LLMs) such as GPT-4 and PaLM 2 can process extensive texts and large vision models (LVMs) such as the Segment Anything Model (SAM) can process extensive imaging data to enhance the efficiency and precision of radiation therapy. This paper explores full-spectrum applications of AGI across radiation oncology including initial consultation, simulation, treatment planning, treatment delivery, treatment verification, and patient follow-up. The fusion of vision data with LLMs also creates powerful multimodal models that elucidate nuanced clinical patterns. Together, AGI promises to catalyze a shift towards data-driven, personalized radiation therapy. However, these models should complement human expertise and care. This paper provides an overview of how AGI can transform radiation oncology to elevate the standard of patient care in radiation oncology, with the key insight being AGI's ability to exploit multimodal clinical data at scale.
In the era of targeted therapy, there has been increasing concern about the development of oncology drugs based on the "more is better" paradigm, developed decades ago for chemotherapy. Recently, the US Food and Drug Administration (FDA) initiated Project Optimus to reform the dose optimization and dose selection paradigm in oncology drug development. To accommodate this paradigm shifting, we propose a dose-ranging approach to optimizing dose (DROID) for oncology trials with targeted drugs. DROID leverages the well-established dose-ranging study framework, which has been routinely used to develop non-oncology drugs for decades, and bridges it with established oncology dose-finding designs to optimize the dose of oncology drugs. DROID consists of two seamlessly connected stages. In the first stage, patients are sequentially enrolled and adaptively assigned to investigational doses to establish the therapeutic dose range (TDR), defined as the range of doses with acceptable toxicity and efficacy profiles, and the recommended phase 2 dose set (RP2S). In the second stage, patients are randomized to the doses in RP2S to assess the dose-response relationship and identify the optimal dose.
Mechanistic learning, the synergistic combination of knowledge-driven and data-driven modeling, is an emerging field. In particular, in mathematical oncology, the application of mathematical modeling to cancer biology and oncology, the use of mechanistic learning is growing. This review aims to capture the current state of the field and provide a perspective on how mechanistic learning may further progress in mathematical oncology. We highlight the synergistic potential of knowledge-driven mechanistic mathematical modeling and data-driven modeling, such as machine and deep learning. We point out similarities and differences regarding model complexity, data requirements, outputs generated, and interpretability of the algorithms and their results. Then, organizing combinations of knowledge- and data-driven modeling into four categories (sequential, parallel, intrinsic, and extrinsic mechanistic learning), we summarize a variety of approaches at the interface between purely data- and knowledge-driven models. Using examples predominantly from oncology, we discuss a range of techniques including physics-informed neural networks, surrogate model learning, and digital twins. We see that m
The function of the organism hinges on the performance of its information-processing networks, which convey information via molecular recognition. Many paths within these networks utilize molecular codebooks, such as the genetic code, to translate information written in one class of molecules into another molecular "language" . The present paper examines the emergence and evolution of molecular codes in terms of rate-distortion theory and reviews recent results of this approach. We discuss how the biological problem of maximizing the fitness of an organism by optimizing its molecular coding machinery is equivalent to the communication engineering problem of designing an optimal information channel. The fitness of a molecular code takes into account the interplay between the quality of the channel and the cost of resources which the organism needs to invest in its construction and maintenance. We analyze the dynamics of a population of organisms that compete according to the fitness of their codes. The model suggests a generic mechanism for the emergence of molecular codes as a phase transition in an information channel. This mechanism is put into biological context and demonstrated
Existing molecular communication systems, both theoretical and experimental, are characterized by low information rates. In this paper, inspired by time-of-flight mass spectrometry (TOFMS), we consider the design of a molecular communication system in which the channel is a vacuum and demonstrate that this method has the potential to increase achievable information rates by many orders of magnitude. We use modelling results from TOFMS to obtain arrival time distributions for accelerated ions and use them to analyze several species of ions, including hydrogen, nitrogen, argon, and benzene. We show that the achievable information rates can be increased using a velocity (Wien) filter, which reduces uncertainty in the velocity of the ions. Using a simplified communication model, we show that data rates well above 1 Gbit/s/molecule are achievable.
Nowadays, the increase in patient demand and the decline in resources are lengthening patient waiting times in many chemotherapy oncology departments. Therefore, enhancing healthcare services is necessary to reduce patient complaints. Reducing the patient waiting times in the oncology departments represents one of the main goals of healthcare manager. Simulation models are considered an effective tool for identifying potential ways to improve patient flow in oncology departments. This paper presents a new agent-based simulation model designed to be configurable and adaptable to the needs of oncology departments which have to interact with an external pharmacy. When external pharmacies are utilized, a courier service is needed to deliver the individual therapies from the pharmacy to the oncology department. An oncology department located in southern Italy was studied through the simulation model and different scenarios were compared with the aim of selecting the department configuration capable of reducing the patient waiting times.
Molecular recognition, which is essential in processing information in biological systems, takes place in a crowded noisy biochemical environment and requires the recognition of a specific target within a background of various similar competing molecules. We consider molecular recognition as a transmission of information via a noisy channel and use this analogy to gain insights on the optimal, or fittest, molecular recognizer. We focus on the optimal structural properties of the molecules such as flexibility and conformation. We show that conformational changes upon binding, which often occur during molecular recognition, may optimize the detection performance of the recognizer. We thus suggest a generic design principle termed 'conformational proofreading' in which deformation enhances detection. We evaluate the optimal flexibility of the molecular recognizer, which is analogous to the stochasticity in a decision unit. In some scenarios, a flexible recognizer, i.e., a stochastic decision unit, performs better than a rigid, deterministic one. As a biological example, we discuss conformational changes during homologous recombination, the process of genetic exchange between two DNA s
This contribution exploits the duality between a viral infection process and macroscopic air-based molecular communication. Airborne aerosol and droplet transmission through human respiratory processes is modeled as an instance of a multiuser molecular communication scenario employing respiratory-event-driven molecular variable-concentration shift keying. Modeling is aided by experiments that are motivated by a macroscopic air-based molecular communication testbed. In artificially induced coughs, a saturated aqueous solution containing a fluorescent dye mixed with saliva is released by an adult test person. The emitted particles are made visible by means of optical detection exploiting the fluorescent dye. The number of particles recorded is significantly higher in test series without mouth and nose protection than in those with a wellfitting medical mask. A simulation tool for macroscopic molecular communication processes is extended and used for estimating the transmission of infectious aerosols in different environments. Towards this goal, parameters obtained through self experiments are taken. The work is inspired by the recent outbreak of the coronavirus pandemic.
The estimation of molecular abundances in interstellar clouds from spectroscopic observations requires radiative transfer calculations, which depend on basic molecular input data. This paper reviews recent developments in the fields of molecular data and radiative transfer. The first part is an overview of radiative transfer techniques, along with a "road map" showing which technique should be used in which situation. The second part is a review of measurements and calculations of molecular spectroscopic and collisional data, with a summary of recent collisional calculations and suggested modeling strategies if collision data are unavailable. The paper concludes with an overview of future developments and needs in the areas of radiative transfer and molecular data.
The CDMS was founded 1998 to provide in its catalog section line lists of molecular species which may be observed in various astronomical sources using radio astronomy. The line lists contain transition frequencies with qualified accuracies, intensities, quantum numbers, as well as further auxilary information. They have been generated from critically evaluated experimental line lists, mostly from laboratory experiments, employing established Hamiltonian models. Seperate entries exist for different isotopic species and usually also for different vibrational states. As of December 2015, the number of entries is 792. They are available online as ascii tables with additional files documenting information on the entries. The Virtual Atomic and Molecular Data Centre was founded more than 5 years ago as a common platform for atomic and molecular data. This platform facilitates exchange not only between spectroscopic databases related to astrophysics or astrochemistry, but also with collisional and kinetic databases. A dedicated infrastructure was developed to provide a common data format in the various databases enabling queries to a large variety of databases on atomic and molecular dat
Molecular Communication (MC) is a communication strategy that uses molecules as carriers of information, and is widely used by biological cells. As an interdisciplinary topic, it has been studied by biologists, communication theorists and a growing number of information theorists. This paper aims to specifically bring MC to the attention of information theorists. To do this, we first highlight the unique mathematical challenges of studying the capacity of molecular channels. Addressing these problems require use of known, or development of new mathematical tools. Toward this goal, we review a subjective selection of the existing literature on information theoretic aspect of molecular communication. The emphasis here is on the mathematical techniques used, rather than on the setup or modeling of a specific paper. Finally, as an example, we propose a concrete information theoretic problem that was motivated by our study of molecular communication.