Achieving complete reproducibility in science, particularly in research fields such as biodiversity, is challenging due to analytical choices, bias and interpretation. Here, we examine examples of reproducibility in biological systematics, ecology, and molecular biology. To mitigate the impact of interpretation and analytical choices, Artificial Intelligence (AI) has provided potential tools. In the present work, while emphasizing the need for methodological rigor and transparency, we acknowledge the role of interpretation in activities such as coding biological characters and detecting morphological patterns in nature. We explore the opportunities and limitations associated with the synergy between big data and AI in molecular biology, emphasizing the need for a more comprehensive and integrated approach based on dataset quality and usefulness. We conclude by advocating for AI-based tools to assist biologists, reinforcing consilience as a criterion for scientific validity without hindering scientific progress.
Biology is perhaps the most complex of the sciences, given the incredible variety of chemical species that are interconnected in spatial and temporal pathways that are daunting to understand. Their interconnections lead to emergent properties such as memory, consciousness, and recognition of self and non-self. To understand how these interconnected reactions lead to cellular life characterized by activation, inhibition, regulation, homeostasis, and adaptation, computational analyses and simulations are essential, a fact recognized by the biological communities. At the same time, students struggle to understand and apply binding and kinetic analyses for the simplest reactions such as the irreversible first-order conversion of a single reactant to a product. This likely results from cognitive difficulties in combining structural, chemical, mathematical, and textual descriptions of binding and catalytic reactions. To help students better understand dynamic reactions and their analyses, we have introduced two kinds of interactive graphs and simulations into the online educational resource, Fundamentals of Biochemistry, a multivolume biochemistry textbook that is part of the LibreText c
Some astrobiological models suggest that molecular clouds may serve as habitats for extraterrestrial life. This study reviews recent theoretical work addressing the physical and biochemical prerequisites for life in such environments, with particular focus on three subjects: (1) bioenergetic pathways under extreme low-temperature conditions; (2) the emergence and preservation of biomolecular chirality; and (3) detection methodologies for potential biosignatures. In this paper, we formally introduce the molecular cloud biology concept, which integrates all physicochemical and metabolic processes hypothesized to sustain life within molecular clouds. As a potential branch of astrobiology, molecular cloud biology warrants interdisciplinary collaborative research to validate its foundational assumptions and explore its scientific implications.
This article frames the relation between biology and physics by characterizing the former as a subdiscipline rather than a special case of the latter. To do this, we posit biological physics as the science of living matter in contrast to classic biophysics, the study of organismal properties by physical techniques. At the scale of the individual cell, living matter is nonunitary, i.e., not composed of aggregated subunits, and has features (e.g., intracellular organizational arrangements and biomolecular condensates) that are unlike any materials of the nonliving world. In transiently or constitutively multicellular forms (social microorganisms, animals, plants), living matter sustains physical processes that are generic (shared with nonliving matter, e.g., subunit communication by molecular diffusion in cellular slime molds), biogeneric (analogous to nonliving matter but realized through cellular activities, e.g., subunit demixing in animal embryos) or nongeneric (pertaining to sui generis materials, e.g., budding of active solids in plants). This "forms of matter" perspective is philosophically situated in the dialectical materialism of Engels and Hessen and the multilevel physica
DNA and RNA are generally regarded as central molecules in molecular biology. Recent advancements in the field of DNA/RNA nanotechnology successfully used DNA/RNA as programmable molecules to construct molecular machines and nanostructures with predefined shapes and functions. The key mechanism for dynamic control of the conformations of these DNA/RNA nanodevices is a reaction called strand displacement, in which one strand in a formed duplex is replaced by a third invading strand. While DNA/RNA strand displacement has mainly been used to de-novo design molecular devices, we argue in this review that this reaction is also likely to play a key role in multiple cellular events such as gene recombination, CRISPR-based genome editing, and RNA cotranscriptional folding. We introduce the general mechanism of strand displacement reaction, give examples of its use in the construction of molecular machines, and finally review natural processes having characteristic which suggest that strand displacement is occurring.
We survey and introduce concepts and tools located at the intersection of information theory and network biology. We show that Shannon's information entropy, compressibility and algorithmic complexity quantify different local and global aspects of synthetic and biological data. We show examples such as the emergence of giant components in Erdos-Renyi random graphs, and the recovery of topological properties from numerical kinetic properties simulating gene expression data. We provide exact theoretical calculations, numerical approximations and error estimations of entropy, algorithmic probability and Kolmogorov complexity for different types of graphs, characterizing their variant and invariant properties. We introduce formal definitions of complexity for both labeled and unlabeled graphs and prove that the Kolmogorov complexity of a labeled graph is a good approximation of its unlabeled Kolmogorov complexity and thus a robust definition of graph complexity.
Molecular biology and biochemistry interpret microscopic processes in the living world in terms of molecular structures and their interactions, which are quantum mechanical by their very nature. Whereas the theoretical foundations of these interactions are very well established, the computational solution of the relevant quantum mechanical equations is very hard. However, much of molecular function in biology can be understood in terms of classical mechanics, where the interactions of electrons and nuclei have been mapped onto effective classical surrogate potentials that model the interaction of atoms or even larger entities. The simple mathematical structure of these potentials offers huge computational advantages; however, this comes at the cost that all quantum correlations and the rigorous many-particle nature of the interactions are omitted. In this work, we discuss how quantum computation may advance the practical usefulness of the quantum foundations of molecular biology by offering computational advantages for simulations of biomolecules. We not only discuss typical quantum mechanical problems of the electronic structure of biomolecules in this context, but also consider t
Understanding the biological mechanisms of disease is crucial for medicine, and in particular, for drug discovery. AI-powered analysis of genome-scale biological data holds great potential in this regard. The increasing availability of single-cell RNA sequencing data has enabled the development of large foundation models for disease biology. However, existing foundation models only modestly improve over task-specific models in downstream applications. Here, we explored two avenues for improving single-cell foundation models. First, we scaled the pre-training data to a diverse collection of 116 million cells, which is larger than those used by previous models. Second, we leveraged the availability of large-scale biological annotations as a form of supervision during pre-training. We trained the \model family of models comprising six transformer-based state-of-the-art single-cell foundation models with 70 million, 160 million, and 400 million parameters. We vetted our models on several downstream evaluation tasks, including identifying the underlying disease state of held-out donors not seen during training, distinguishing between diseased and healthy cells for disease conditions and
I believe an atomic biology is needed to supplement present day molecular biology, if we are to design and understand proteins, as well as define, make, and use them. Topics in the paper are molecular biology and atomic biology. Electrodiffusion in the open channel. Electrodiffusion in mixed electrolytes. Models of permeation. State Models of Permeation are Inconsistent with the Electric Field. Making models in atomic biology. Molecular dynamics. Temporal Limitations; Spatial Limitations; Periodic boundary conditions. Hierarchy of models of the open channel. Stochastic Motion of the Channel. Langevin Dynamics. Simulations of the Reaction Path: the Permion. Chemical reactions. What was wrong? Back to the hierarchy: Occam's razor can slit your throat. Poisson-Nernst-Planck PNP Models Flux Ratios; Pumping by Field Coupling. Gating in channels of one conformation. Gating by Field Switching; Gating Current; Gating in Branched Channels; Blocking. Back to the hierarchy: Linking levels. Is there a theory? At what level will the adaptation be found? Simplicity, evolution, and natural function.
We developed a theory showing that under appropriate normalizations and rescalings, temperature response curves show a remarkably regular behavior and follow a general, universal law. The impressive universality of temperature response curves remained hidden due to various curve-fitting models not well-grounded in first principles. In addition, this framework has the potential to explain the origin of different scaling relationships in thermal performance in biology, from molecules to ecosystems. Here, we summarize the background, principles and assumptions, predictions, implications, and possible extensions of this theory.
This paper presents a comprehensive list of the scientific articles of Giulio Fermi (1936-1997), son of the Italian-American physicist Enrico Fermi, published between 1962 and 1997. The initial research activity of Giulio was concerned with virology and biological cybernetics while, from 1975 onward, his work was completely devoted to protein crystallography. The crystallographic research was carried out in collaboration with Nobel laureate Max Perutz at the Medical Research Council (MRC) Laboratory of Molecular Biology in Cambridge (United Kingdom). A short biography of Giulio (Judd) Fermi appears inside John Finch's book A Nobel Fellow on Every Floor: A History of the Medical Research Council Laboratory of Molecular Biology published by the MRC in 2008.
We aim to characterize the U-band variability of young brown dwarfs in the Taurus Molecular Cloud and discuss its origin. We used the XMM-Newton Extended Survey of the Taurus Molecular Cloud, where a sample of 11 young bona fide brown dwarfs (spectral type later than M6) were observed simultaneously in X-rays with XMM-Newton and in the U-band with the XMM-Newton Optical/UV Monitor (OM). We obtained upper limits to the U-band emission of 10 brown dwarfs (U>19.6-20.6 mag), whereas 2MASSJ04141188+2811535 was detected in the U-band. Remarkably, the magnitude of this brown dwarf increased regularly from U~19.5 mag at the beginning of the observation, peaked 6h later at U~18.4 mag, and then decreased to U~18.65 mag in the next 2h. The first OM U-band measurement is consistent with the quiescent level observed about one year later thanks to ground follow-up observations. This brown dwarf was not detected in X-rays by XMM-Newton during the OM observation. We discuss the possible sources of U-band variability for this young brown dwarf, namely a magnetic flare, non-steady accretion onto the substellar surface, and rotational modulation of a hot spot. We conclude that this event is relate
Recent tumor genome sequencing confirmed that one tumor often consists of multiple cell subpopulations (clones) which bear different, but related, genetic profiles such as mutation and copy number variation profiles. Thus far, one tumor has been viewed as a whole entity in cancer functional studies. With the advances of genome sequencing and computational analysis, we are able to quantify and computationally dissect clones from tumors, and then conduct clone-based analysis. Emerging technologies such as single-cell genome sequencing and RNA-Seq could profile tumor clones. Thus, we should reconsider how to conduct cancer systems biology studies in the genome sequencing era. We will outline new directions for conducting cancer systems biology by considering that genome sequencing technology can be used for dissecting, quantifying and genetically characterizing clones from tumors. Topics discussed in Part 1 of this review include computationally quantifying of tumor subpopulations; clone-based network modeling, cancer hallmark-based networks and their high-order rewiring principles and the principles of cell survival networks of fast-growing clones.
The sampling of scale-free networks in Molecular Biology is usually achieved by growing networks from a seed using recursive algorithms with elementary moves which include the addition and deletion of nodes and bonds. These algorithms include the Barabasi-Albert algorithm. Later algorithms, such as the Duplication-Divergence algorithm, the Solé algorithm and the iSite algorithm, were inspired by biological processes underlying the evolution of protein networks, and the networks they produce differ essentially from networks grown by the Barabasi-Albert algorithm. In this paper the mean field analysis of these algorithms is reconsidered, and extended to variant and modified implementations of the algorithms. The degree sequences of scale-free networks decay according to a powerlaw distribution, namely $P(k) \sim k^{-γ}$, where $γ$ is a scaling exponent. We derive mean field expressions for $γ$, and test these by numerical simulations. Generally, good agreement is obtained. We also found that some algorithms do not produce scale-free networks (for example some variant Barabasi-Albert and Solé networks).
This article introduces the physics of information in the context of molecular biology and genomics. Entropy and information, the two central concepts of Shannon's theory of information and communication, are often confused with each other but play transparent roles when applied to statistical ensembles (i.e., identically prepared sets) of symbolic sequences. Such an approach can distinguish between entropy and information in genes, predict the secondary structure of ribozymes, and detect the covariation between residues in folded proteins. We also review applications to molecular sequence and structure analysis, and introduce new tools in the characterization of resistance mutations, and in drug design.
The central dogma of molecular biology, formulated more than five decades ago, compartmentalized information exchange in the cell into the DNA, RNA and protein domains. This formalization has served as an implicit thematic distinguisher for cell biological research ever since. However, a clear account of the distribution of research across this formalization over time does not exist. Abstracts of >3.5 million publications focusing on the cell from 1975 to 2011 were analyzed for the frequency of 100 single-word DNA-, RNA- and protein-centric search terms and amalgamated to produce domain- and subdomain-specific trends. A preponderance of protein- over DNA- and in turn over RNA-centric terms as a percentage of the total word count is evident until the early 1990s, at which point the trends for protein and DNA begin to coalesce while RNA percentages remain relatively unchanged. This term-based census provides a yearly snapshot of the distribution of research interests across the three domains of the central dogma of molecular biology. A frequency chart of the most dominantly-studied elements of the periodic table is provided as an addendum.
In this paper, we propose and study several inverse problems of determining unknown parameters in nonlocal nonlinear coupled PDE systems, including the potentials, nonlinear interaction functions and time-fractional orders. In these coupled systems, we enforce non-negativity of the solutions, aligning with realistic scenarios in biology and ecology. There are several salient features of our inverse problem study: the drastic reduction in measurement/observation data due to averaging effects, the nonlinear coupling between multiple equations, and the nonlocality arising from fractional-type derivatives. These factors present significant challenges to our inverse problem, and such inverse problems have never been explored in previous literature. To address these challenges, we develop new and effective schemes. Our approach involves properly controlling the injection of different source terms to obtain multiple sets of mean flux data. This allows us to achieve unique identifiability results and accurately determine the unknown parameters. Finally, we establish a connection between our study and practical applications in biology, further highlighting the relevance of our work in real-
The molecular machinery of life is largely created via self-organisation of individual molecules into functional assemblies. Minimal coarse-grained models, where a whole macromolecule is represented by a small number of particles, can be of great value in identifying the main driving forces behind self-organisation in cell biology. Such models can incorporate data from both molecular and continuum scales, and their results can be directly compared to experiments. Here we review the state of the art of models for studying the formation and biological function of macromolecular assemblies in cells. We outline the key ingredients of each model and their main findings. We illustrate the contribution of this class of simulations to identifying the physical mechanisms behind life and diseases, and discuss their future developments.
Quantum computers can in principle solve certain problems exponentially more quickly than their classical counterparts. We have not yet reached the advent of useful quantum computation, but when we do, it will affect nearly all scientific disciplines. In this review, we examine how current quantum algorithms could revolutionize computational biology and bioinformatics. There are potential benefits across the entire field, from the ability to process vast amounts of information and run machine learning algorithms far more efficiently, to algorithms for quantum simulation that are poised to improve computational calculations in drug discovery, to quantum algorithms for optimization that may advance fields from protein structure prediction to network analysis. However, these exciting prospects are susceptible to "hype", and it is also important to recognize the caveats and challenges in this new technology. Our aim is to introduce the promise and limitations of emerging quantum computing technologies in the areas of computational molecular biology and bioinformatics.
The last decade has witnessed a rapid growth in understanding of the pivotal roles of mechanical stresses and physical forces in cell biology. As a result an integrated view of cell biology is evolving, where genetic and molecular features are scrutinized hand in hand with physical and mechanical characteristics of cells. Physics of liquid crystals has emerged as a burgeoning new frontier in cell biology over the past few years, fueled by an increasing identification of orientational order and topological defects in cell biology, spanning scales from subcellular filaments to individual cells and multicellular tissues. Here, we provide an account of most recent findings and developments together with future promises and challenges in this rapidly evolving interdisciplinary research direction.