Summary: VTX is a molecular visualization software capable to handle most molecular structures and dynamics trajectories file formats. It features a real-time high-performance molecular graphics engine, based on modern OpenGL, optimized for the visualization of massive molecular systems and molecular dynamics trajectories. VTX includes multiple interactive camera and user interaction features, notably free-fly navigation and a fully modular graphical user interface designed for increased usability. It allows the production of high-resolution images for presentations and posters with custom background. VTX design is focused on performance and usability for research, teaching and educative purposes. Availability and implementation: VTX is open source and free for non commercial use. Builds for Windows and Ubuntu Linux are available at http://vtx.drugdesign.fr. The source code is available at https://github.com/VTX-Molecular-Visualization . Supplementary Information: A video displaying free-fly navigation in a whole-cell model is available
FIR and submm observations have established the fundamental role of dust-obscured star formation in the assembly of stellar mass over the past 12 billion years. At z between 2 and 4, the bulk of star formation is enshrouded in dust, and dusty star forming galaxies (DSFGs) contain about half of the total stellar mass density. Star formation develops in dense molecular clouds, and is regulated by a complex interplay between all the ISM components that contribute to the energy budget of a galaxy: gas, dust, cosmic rays, interstellar electromagnetic fields, gravitational field, dark matter. Molecular gas is the actual link between star forming gas and its complex environment, providing by far the richest amount of information about the star formation process. However, molecular lines interpretation requires complex modeling of astrochemical networks, which regulate the molecular formation and establishes molecular abundances in a cloud, and a modeling of the physical conditions of the gas in which molecular energy levels become populated. This paper critically reviews the main astrochemical parameters needed to get predictions about molecular signals in DSFGs. We review the current kno
Molecular communication (MC) provides a foundational framework for information transmission in the Internet of Bio-Nano Things (IoBNT), where efficiency and reliability are crucial. However, the inherent limitations of molecular channels, such as low transmission rates, noise, and intersymbol interference (ISI), limit their ability to support complex data transmission. This paper proposes an end-to-end semantic learning framework designed to optimize task-oriented molecular communication, with a focus on biomedical diagnostic tasks under resource-constrained conditions. The proposed framework employs a deep encoder-decoder architecture to efficiently extract, quantize, and decode semantic features, prioritizing taskrelevant semantic information to enhance diagnostic classification performance. Additionally, a probabilistic channel network is introduced to approximate molecular propagation dynamics, enabling gradient-based optimization for end-to-end learning. Experimental results demonstrate that the proposed semantic framework improves diagnostic accuracy by at least 25% compared to conventional JPEG compression with LDPC coding methods under resource-constrained communication sce
AI-assisted molecular property prediction has become a promising technique in early-stage drug discovery and materials design in recent years. However, due to high-cost and complex wet-lab experiments, real-world molecules usually experience the issue of scarce annotations, leading to limited labeled data for effective supervised AI model learning. In light of this, few-shot molecular property prediction (FSMPP) has emerged as an expressive paradigm that enables learning from only a few labeled examples. Despite rapidly growing attention, existing FSMPP studies remain fragmented, without a coherent framework to capture methodological advances and domain-specific challenges. In this work, we present the first comprehensive and systematic survey of few-shot molecular property prediction. We begin by analyzing the few-shot phenomenon in molecular datasets and highlighting two core challenges: (1) cross-property generalization under distribution shifts, where each task corresponding to each property, may follow a different data distribution or even be inherently weakly related to others from a biochemical perspective, requiring the model to transfer knowledge across heterogeneous predi
Information molecules play a crucial role in molecular communication (MC), acting as carriers for information transfer. A common approach to get information molecules in MC involves harvesting them from the environment; however, the harvested molecules are often a mixture of various environmental molecules, and the initial concentration ratios in the reservoirs are identical, which hampers high-fidelity transmission techniques such as molecular shift keying (MoSK). This paper presents a transmitter design that harvests molecules from the surrounding environment and stores them in two reservoirs. To separate the mixed molecules, energy is consumed to transfer them between reservoirs. Given limited energy resources, this work explores energy-efficient strategies to optimize transmitter performance. Through theoretical analysis and simulations, we investigate different methods for moving molecules between reservoirs. The results demonstrate that transferring higher initial concentration molecules enhances transmitter performance, while using fewer molecules per transfer further improves efficiency. These findings provide valuable insights for optimizing MC systems through energy-effic
The function of the organism hinges on the performance of its information-processing networks, which convey information via molecular recognition. Many paths within these networks utilize molecular codebooks, such as the genetic code, to translate information written in one class of molecules into another molecular "language" . The present paper examines the emergence and evolution of molecular codes in terms of rate-distortion theory and reviews recent results of this approach. We discuss how the biological problem of maximizing the fitness of an organism by optimizing its molecular coding machinery is equivalent to the communication engineering problem of designing an optimal information channel. The fitness of a molecular code takes into account the interplay between the quality of the channel and the cost of resources which the organism needs to invest in its construction and maintenance. We analyze the dynamics of a population of organisms that compete according to the fitness of their codes. The model suggests a generic mechanism for the emergence of molecular codes as a phase transition in an information channel. This mechanism is put into biological context and demonstrated
Existing molecular communication systems, both theoretical and experimental, are characterized by low information rates. In this paper, inspired by time-of-flight mass spectrometry (TOFMS), we consider the design of a molecular communication system in which the channel is a vacuum and demonstrate that this method has the potential to increase achievable information rates by many orders of magnitude. We use modelling results from TOFMS to obtain arrival time distributions for accelerated ions and use them to analyze several species of ions, including hydrogen, nitrogen, argon, and benzene. We show that the achievable information rates can be increased using a velocity (Wien) filter, which reduces uncertainty in the velocity of the ions. Using a simplified communication model, we show that data rates well above 1 Gbit/s/molecule are achievable.
Neural Networks (GNNs) have revolutionized the molecular discovery to understand patterns and identify unknown features that can aid in predicting biophysical properties and protein-ligand interactions. However, current models typically rely on 2-dimensional molecular representations as input, and while utilization of 2\3- dimensional structural data has gained deserved traction in recent years as many of these models are still limited to static graph representations. We propose a novel approach based on the transformer model utilizing GNNs for characterizing dynamic features of protein-ligand interactions. Our message passing transformer pre-trains on a set of molecular dynamic data based off of physics-based simulations to learn coordinate construction and make binding probability and affinity predictions as a downstream task. Through extensive testing we compare our results with the existing models, our MDA-PLI model was able to outperform the molecular interaction prediction models with an RMSE of 1.2958. The geometric encodings enabled by our transformer architecture and the addition of time series data add a new dimensionality to this form of research.
Molecular recognition, which is essential in processing information in biological systems, takes place in a crowded noisy biochemical environment and requires the recognition of a specific target within a background of various similar competing molecules. We consider molecular recognition as a transmission of information via a noisy channel and use this analogy to gain insights on the optimal, or fittest, molecular recognizer. We focus on the optimal structural properties of the molecules such as flexibility and conformation. We show that conformational changes upon binding, which often occur during molecular recognition, may optimize the detection performance of the recognizer. We thus suggest a generic design principle termed 'conformational proofreading' in which deformation enhances detection. We evaluate the optimal flexibility of the molecular recognizer, which is analogous to the stochasticity in a decision unit. In some scenarios, a flexible recognizer, i.e., a stochastic decision unit, performs better than a rigid, deterministic one. As a biological example, we discuss conformational changes during homologous recombination, the process of genetic exchange between two DNA s
The estimation of molecular abundances in interstellar clouds from spectroscopic observations requires radiative transfer calculations, which depend on basic molecular input data. This paper reviews recent developments in the fields of molecular data and radiative transfer. The first part is an overview of radiative transfer techniques, along with a "road map" showing which technique should be used in which situation. The second part is a review of measurements and calculations of molecular spectroscopic and collisional data, with a summary of recent collisional calculations and suggested modeling strategies if collision data are unavailable. The paper concludes with an overview of future developments and needs in the areas of radiative transfer and molecular data.
This contribution exploits the duality between a viral infection process and macroscopic air-based molecular communication. Airborne aerosol and droplet transmission through human respiratory processes is modeled as an instance of a multiuser molecular communication scenario employing respiratory-event-driven molecular variable-concentration shift keying. Modeling is aided by experiments that are motivated by a macroscopic air-based molecular communication testbed. In artificially induced coughs, a saturated aqueous solution containing a fluorescent dye mixed with saliva is released by an adult test person. The emitted particles are made visible by means of optical detection exploiting the fluorescent dye. The number of particles recorded is significantly higher in test series without mouth and nose protection than in those with a wellfitting medical mask. A simulation tool for macroscopic molecular communication processes is extended and used for estimating the transmission of infectious aerosols in different environments. Towards this goal, parameters obtained through self experiments are taken. The work is inspired by the recent outbreak of the coronavirus pandemic.
The CDMS was founded 1998 to provide in its catalog section line lists of molecular species which may be observed in various astronomical sources using radio astronomy. The line lists contain transition frequencies with qualified accuracies, intensities, quantum numbers, as well as further auxilary information. They have been generated from critically evaluated experimental line lists, mostly from laboratory experiments, employing established Hamiltonian models. Seperate entries exist for different isotopic species and usually also for different vibrational states. As of December 2015, the number of entries is 792. They are available online as ascii tables with additional files documenting information on the entries. The Virtual Atomic and Molecular Data Centre was founded more than 5 years ago as a common platform for atomic and molecular data. This platform facilitates exchange not only between spectroscopic databases related to astrophysics or astrochemistry, but also with collisional and kinetic databases. A dedicated infrastructure was developed to provide a common data format in the various databases enabling queries to a large variety of databases on atomic and molecular dat
Molecular Communication (MC) is a communication strategy that uses molecules as carriers of information, and is widely used by biological cells. As an interdisciplinary topic, it has been studied by biologists, communication theorists and a growing number of information theorists. This paper aims to specifically bring MC to the attention of information theorists. To do this, we first highlight the unique mathematical challenges of studying the capacity of molecular channels. Addressing these problems require use of known, or development of new mathematical tools. Toward this goal, we review a subjective selection of the existing literature on information theoretic aspect of molecular communication. The emphasis here is on the mathematical techniques used, rather than on the setup or modeling of a specific paper. Finally, as an example, we propose a concrete information theoretic problem that was motivated by our study of molecular communication.
Centaurus A, the nearest AGN shows molecular absorption in the millimeter and radio regime. By observing the absorption with VLBI, we try to constrain the distribution of the gas, in particular whether it resides in the circumnuclear region. Analysis of VLBA observations in four OH and two H2CO transitions is presented here, as well as molecular excitation models parameterized with distance from the AGN. We conclude that the gas is most likely associated with the tilted molecular ring structure observed before in molecular emission and IR continuum. The formaldehyde absorption shows small scale absorption which requires a different distribution than the hydroxyl.
G-Protein Coupled Receptors (GPCRs) are a big family of eukaryotic cell transmembrane proteins, responsible for numerous biological processes. From a practical viewpoint around 34\% of the drugs approved by the US Food and Drug Administration target these receptors. They can be analyzed from their simulated molecular dynamics, including the prediction of their behavior in the presence of drugs. In this paper, the capability of Long Short-Term Memory Networks (LSTMs) are evaluated to learn and predict the molecular dynamic trajectories of a receptor. Several models were trained with the 3D position of the amino acids of the receptor considering different transformations on the position of the amino acid, such as their centers of mass, the geometric centers and the position of the $α$--carbon for each amino acid. The error of the prediction of the position was evaluated by the mean average error (MAE) and root-mean-square deviation (RMSD). The LSTM models show a robust performance, with results comparable to the state-of-the-art in non-dynamic 3D predictions. The best MAE and RMSD values were found for the mass center of the amino acids with 0.078 Å and 0.156 Å respectively. This wor
Molecular codes translate information written in one type of molecules into another molecular language. We introduce a simple model that treats molecular codes as noisy information channels. An optimal code is a channel that conveys information accurately and efficiently while keeping down the impact of errors. The equipoise of the three conflicting needs, for minimal error-load, minimal cost of resources and maximal diversity of vocabulary, defines the fitness of the code. The model suggests a mechanism for the emergence of a code when evolution varies the parameters that control this equipoise and the mapping between the two molecular languages becomes non-random. This mechanism is demonstrated by a simple toy model that is formally equivalent to a mean-field Ising magnet.
Computing equilibrium concentrations of molecular complexes is generally analytically intractable and requires numerical approaches. In this work we focus on the polymer-monomer level, where indivisible molecules (monomers) combine to form complexes (polymers). Rather than employing free-energy parameters for each polymer, we focus on the athermic setting where all interactions preserve enthalpy. This setting aligns with the strongly bonded (domain-based) regime in DNA nanotechnology when strands can bind in different ways, but always with maximum overall bonding -- and is consistent with the saturated configurations in the Thermodynamic Binding Networks (TBNs) model. Within this context, we develop an iterative algorithm for assigning polymer concentrations to satisfy detailed-balance, where on-target (desired) polymers are in high concentrations and off-target (undesired) polymers are in low. Even if not directly executed, our algorithm provides effective insights into upper bounds on concentration of off-target polymers, connecting combinatorial arguments about discrete configurations such as those in the TBN model to real-valued concentrations. We conclude with an application o
In order to function reliably, synthetic molecular circuits require mechanisms that allow them to adapt to environmental disturbances. Least mean squares (LMS) schemes, such as commonly encountered in signal processing and control, provide a powerful means to accomplish that goal. In this paper we show how the traditional LMS algorithm can be implemented at the molecular level using only a few elementary biomolecular reactions. We demonstrate our approach using several simulation studies and discuss its relevance to synthetic biology.
Numerous biological functions-such as enzymatic catalysis, the immune response system, and the DNA-protein regulatory network-rely on the ability of molecules to specifically recognize target molecules within a large pool of similar competitors in a noisy biochemical environment. Using the basic framework of signal detection theory, we treat the molecular recognition process as a signal detection problem and examine its overall performance. Thus, we evaluate the optimal properties of a molecular recognizer in the presence of competition and noise. Our analysis reveals that the optimal design undergoes a "phase transition" as the structural properties of the molecules and interaction energies between them vary. In one phase, the recognizer should be complementary in structure to its target (like a lock and a key), while in the other, conformational changes upon binding, which often accompany molecular recognition, enhance recognition quality. Using this framework, the abundance of conformational changes may be explained as a result of increasing the fitness of the recognizer. Furthermore, this analysis may be used in future design of artificial signal processing devices based on bio
This paper studies the capacity of molecular communications in fluid media, where the information is encoded in the number of transmitted molecules in a time-slot (amplitude shift keying). The propagation of molecules is governed by random Brownian motion and the communication is in general subject to inter-symbol interference (ISI). We first consider the case where ISI is negligible and analyze the capacity and the capacity per unit cost of the resulting discrete memoryless molecular channel and the effect of possible practical constraints, such as limitations on peak and/or average number of transmitted molecules per transmission. In the case with a constrained peak molecular emission, we show that as the time-slot duration increases, the input distribution achieving the capacity per channel use transitions from binary inputs to a discrete uniform distribution. In this paper, we also analyze the impact of ISI. Crucially, we account for the correlation that ISI induces between channel output symbols. We derive an upper bound and two lower bounds on the capacity in this setting. Using the input distribution obtained by an extended Blahut-Arimoto algorithm, we maximize the lower bou