Understanding the biological mechanisms of disease is crucial for medicine, and in particular, for drug discovery. AI-powered analysis of genome-scale biological data holds great potential in this regard. The increasing availability of single-cell RNA sequencing data has enabled the development of large foundation models for disease biology. However, existing foundation models only modestly improve over task-specific models in downstream applications. Here, we explored two avenues for improving single-cell foundation models. First, we scaled the pre-training data to a diverse collection of 116 million cells, which is larger than those used by previous models. Second, we leveraged the availability of large-scale biological annotations as a form of supervision during pre-training. We trained the \model family of models comprising six transformer-based state-of-the-art single-cell foundation models with 70 million, 160 million, and 400 million parameters. We vetted our models on several downstream evaluation tasks, including identifying the underlying disease state of held-out donors not seen during training, distinguishing between diseased and healthy cells for disease conditions and
Nearly all cell models explicitly or implicitly deal with the biophysical constraints that must be respected for life to persist. Despite this, there is almost no systematicity in how these constraints are implemented, and we lack a principled understanding of how cellular dynamics interact with them and how they originate in actual biology. Computational cell biology will only overcome these concerns once it treats the life-death boundary as a central concept, creating a theory of cellular viability. We lay the foundation for such a development by demonstrating how specific geometric structures can separate regions of qualitatively similar survival outcomes in our models, offering new global organizing principles for cell fate. We also argue that idealized models of emergent individuals offer a tractable way to begin understanding life's intrinsically generated limits.
Regulation of cell proliferation is a crucial aspect of tissue development and homeostasis and plays a major role in morphogenesis, wound healing, and tumor invasion. A phenomenon of such regulation is contact inhibition, which describes the dramatic slowing of proliferation, cell migration and individual cell growth when multiple cells are in contact with each other. While many physiological, molecular and genetic factors are known, the mechanism of contact inhibition is still not fully understood. In particular, the relevance of cellular signaling due to interfacial contact for contact inhibition is still debated. Cellular automata (CA) have been employed in the past as numerically efficient mathematical models to study the dynamics of cell ensembles, but they are not suitable to explore the origins of contact inhibition as such agent-based models assume fixed cell sizes. We develop a minimal, data-driven model to simulate the dynamics of planar cell cultures by extending a probabilistic CA to incorporate size changes of individual cells during growth and cell division. We successfully apply this model to previous in-vitro experiments on contact inhibition in epithelial tissue: A
Cell-cell communication is essential for tissue development, regeneration and function, and its disruption can lead to diseases and developmental abnormalities. The revolution of single-cell genomics technologies offers unprecedented insights into cellular identities, opening new avenues to resolve the intricate cellular interactions present in tissue niches. CellPhoneDB is a bioinformatics toolkit designed to infer cell-cell communication by combining a curated repository of bona fide ligand-receptor interactions with a set of computational and statistical methods to integrate them with single-cell genomics data. Importantly, CellPhoneDB captures the multimeric nature of molecular complexes, thus representing cell-cell communication biology faithfully. Here we present CellPhoneDB v5, an updated version of the tool, which offers several new features. Firstly, the repository has been expanded by one-third with the addition of new interactions. These encompass interactions mediated by non-protein ligands such as endocrine hormones and GPCR ligands. Secondly, it includes a differentially expression-based methodology for more tailored interaction queries. Thirdly, it incorporates novel
Modern biology and biomedicine are undergoing a big-data explosion needing advanced computational algorithms to extract mechanistic insights on the physiological state of living cells. We present the motivation for the Cell Physiome: a framework and approach for creating, sharing, and using biophysics-based computational models of single cell physiology. Using examples in calcium signaling, bioenergetics, and endosomal trafficking, we highlight the need for spatially detailed, biophysics-based computational models to uncover new mechanisms underlying cell biology. We review progress and challenges to date towards creating cell physiome models. We then introduce bond graphs as an efficient way to create cell physiome models that integrate chemical, mechanical, electromagnetic, and thermal processes while maintaining mass and energy balance. Bond graphs enhance modularization and re-usability of computational models of cells at scale. We conclude with a look forward into steps that will help fully realize this exciting new field of mechanistic biomedical data science.
In a recent paper, Wilmes et al. demonstrated a qualitative integration of omics data streams to gain a mechanistic understanding of cyclosporine A toxicity. One of their major conclusions was that cyclosporine A strongly activates the nuclear factor (erythroid-derived 2)-like 2 pathway (Nrf2) in renal proximal tubular epithelial cells exposed in vitro. We pursue here the analysis of those data with a quantitative integration of omics data with a differential equation model of the Nrf2 pathway. That was done in two steps: (i) Modeling the in vitro pharmacokinetics of cyclosporine A (exchange between cells, culture medium and vial walls) with a minimal distribution model. (ii) Modeling the time course of omics markers in response to cyclosporine A exposure at the cell level with a coupled PK-systems biology model. Posterior statistical distributions of the parameter values were obtained by Markov chain Monte Carlo sampling. Data were well simulated, and the known in vitro toxic effect EC50 was well matched by model predictions. The integration of in vitro pharmacokinetics and systems biology modeling gives us a quantitative insight into mechanisms of cyclosporine A oxidative-stress
The crawling motility of many eukaryotic cells is driven by filamentous actin (F-actin), and regulated by a network of signaling proteins and lipids (including small GTPases). The tangle of positive and negative feedback loops gives rise to various experimentally observed dynamic patterns (``actin waves''). Here we consider a recent prototypical model for actin waves in which F-actin exerts negative feedback onto a GTPase. Guided by recent numerical PDE bifurcation analysis in Hughes (2025) and Hughes et al (2026), we explore cell shapes and motility associated with polar, oscillatory, and traveling waves solutions of a mass-conserved partial differential equation (PDE) model. We use Morpheus (cellular Potts) simulations to investigate the implications of such regimes of behavior on the shapes and motion of cells, and on transitions between modes of behavior. The model demonstrates various cell states, including resting (spatially uniform GTPase), polar cells (static ``zones'' of GTPase), and traveling waves along the cell edge. In some parameter regimes, such states can coexist, so that cells can transition from one behavior to another in response to noisy stimuli.
The last decade has witnessed a rapid growth in understanding of the pivotal roles of mechanical stresses and physical forces in cell biology. As a result an integrated view of cell biology is evolving, where genetic and molecular features are scrutinized hand in hand with physical and mechanical characteristics of cells. Physics of liquid crystals has emerged as a burgeoning new frontier in cell biology over the past few years, fueled by an increasing identification of orientational order and topological defects in cell biology, spanning scales from subcellular filaments to individual cells and multicellular tissues. Here, we provide an account of most recent findings and developments together with future promises and challenges in this rapidly evolving interdisciplinary research direction.
Recent studies have demonstrated the feasibility of modeling single-cell data as natural languages and the potential of leveraging powerful large language models (LLMs) for understanding cell biology. However, a comprehensive evaluation of LLMs' performance on language-driven single-cell analysis tasks still remains unexplored. Motivated by this challenge, we introduce CellVerse, a unified language-centric question-answering benchmark that integrates four types of single-cell multi-omics data and encompasses three hierarchical levels of single-cell analysis tasks: cell type annotation (cell-level), drug response prediction (drug-level), and perturbation analysis (gene-level). Going beyond this, we systematically evaluate the performance across 14 open-source and closed-source LLMs ranging from 160M to 671B on CellVerse. Remarkably, the experimental results reveal: (1) Existing specialist models (C2S-Pythia) fail to make reasonable decisions across all sub-tasks within CellVerse, while generalist models such as Qwen, Llama, GPT, and DeepSeek family models exhibit preliminary understanding capabilities within the realm of cell biology. (2) The performance of current LLMs falls short
With the completion of human genome mapping, the focus of scientists seeking to explain the biological complexity of living systems is shifting from analyzing the individual components (such as a particular gene or biochemical reaction) to understanding the set of interactions amongst the large number of components that results in the different functions of the organism. To this end, the area of systems biology attempts to achieve a "systems-level" description of biology by focusing on the network of interactions instead of the characteristics of its isolated parts. In this article, we briefly describe some of the emerging themes of research in "network" biology, looking at dynamical processes occurring at the two different length scales of within the cell and between cells, viz., the intra-cellular signaling network and the nervous system. We show that focusing on the systems-level aspects of these problems allows one to observe surprising and illuminating common themes amongst them.
Human physiology and pathology arise from the coordinated interactions of diverse single cells. However, analyzing single cells has been limited by the low sensitivity and throughput of analytical methods. DNA sequencing has recently made such analysis feasible for nucleic acids, but single-cell protein analysis remains limited. Mass-spectrometry is the most powerful method for protein analysis, but its application to single cells faces three major challenges: Efficiently delivering proteins/peptides to MS detectors, identifying their sequences, and scaling the analysis to many thousands of single cells. These challenges have motivated corresponding solutions, including SCoPE-design multiplexing and clean, automated, and miniaturized sample preparation. Synergistically applied, these solutions enable quantifying thousands of proteins across many single cells and establish a solid foundation for further advances. Building upon this foundation, the SCoPE concept will enable analyzing subcellular organelles and post-translational modifications while increases in multiplexing capabilities will increase the throughput and decrease cost.
Cellular biology exists embedded in a world dominated by random dynamics and chance. Many vital molecules and pieces of cellular machinery diffuse within cells, moving along random trajectories as they collide with the other biomolecular inhabitants of the cell. Cellular components may block each other's progress, be produced or degraded at random times, and become unevenly separated as cells grow and divide. Cellular behaviour, including important features of stem cells, tumours and infectious bacteria, is profoundly influenced by the chaos which is the environment within the cell walls. Here we will look at some important causes and effects of randomness in cellular biology, and some ways in which researchers, helped by the vast amounts of data that are now flowing in, have made progress in describing the randomness of nature.
Systems biology relies on mathematical models that often involve complex and intractable likelihood functions, posing challenges for efficient inference and model selection. Generative models, such as normalizing flows, have shown remarkable ability in approximating complex distributions in various domains. However, their application in systems biology for approximating intractable likelihood functions remains unexplored. Here, we elucidate a framework for leveraging normalizing flows to approximate complex likelihood functions inherent to systems biology models. By using normalizing flows in the Simulation-based inference setting, we demonstrate a method that not only approximates a likelihood function but also allows for model inference in the model selection setting. We showcase the effectiveness of this approach on real-world systems biology problems, providing practical guidance for implementation and highlighting its advantages over traditional computational methods.
The central dogma of molecular biology, formulated more than five decades ago, compartmentalized information exchange in the cell into the DNA, RNA and protein domains. This formalization has served as an implicit thematic distinguisher for cell biological research ever since. However, a clear account of the distribution of research across this formalization over time does not exist. Abstracts of >3.5 million publications focusing on the cell from 1975 to 2011 were analyzed for the frequency of 100 single-word DNA-, RNA- and protein-centric search terms and amalgamated to produce domain- and subdomain-specific trends. A preponderance of protein- over DNA- and in turn over RNA-centric terms as a percentage of the total word count is evident until the early 1990s, at which point the trends for protein and DNA begin to coalesce while RNA percentages remain relatively unchanged. This term-based census provides a yearly snapshot of the distribution of research interests across the three domains of the central dogma of molecular biology. A frequency chart of the most dominantly-studied elements of the periodic table is provided as an addendum.
Epigenetic Tracking is a mathematical model of biological cells, originally conceived to study embryonic development. Computer simulations proved the capacity of the model to generate complex 3-dimensional cellular structures, and the potential to reproduce the complexity typical of living beings. The most distinctive feature of this model is the presence in the body of a homogeneous distribution of stem cells, which are dinamically and continuously created during development from non-stem cells and reside in niches. Embryonic stem cells orchestrate early developmental events, adult stem cells direct late developmental and regeneration events, ageing stem cells cause ageing and cancer stem cells are responsible for cancer growth. The conceptual backbone provided by Epigenetic Tracking brings together a wide range of biological phenomena: for this reason, we think it can be proposed as a general model for multicellular biology. Despite, or perhaps due to its theoretical origin, the model allowed us to make predictions relevant to very diverse fields of biology, such as transposable elements, and cancer-related patterns of gene mutations. This paper contains a summary of the model an
Bacteria are able to maintain a narrow distribution of cell sizes by regulating the timing of cell divisions. In rich nutrient conditions, cells divide much faster than their chromosomes replicate. This implies that cells maintain multiple rounds of chromosome replication per cell division by regulating the timing of chromosome replications. Here, we show that both cell size and chromosome replication may be simultaneously regulated by the long-standing initiator accumulation strategy. The strategy proposes that initiators are produced in proportion to the volume increase and is accumulated at each origin of replication, and chromosome replication is initiated when a critical amount per origin has accumulated. We show that this model maps to the incremental model of size control, which was previously shown to reproduce experimentally observed correlations between various events in the cell cycle and explains the exponential dependence of cell size on the growth rate of the cell. Furthermore, we show that this model also leads to the efficient regulation of the timing of initiation and the number of origins consistent with existing experimental results.
Myxococcus xanthus cells self-organize into aligned groups, clusters, at various stages of their lifecycle. Formation of these clusters is crucial for the complex dynamic multi-cellular behavior of these bacteria. However, the mechanism underlying the cell alignment and clustering is not fully understood. Motivated by studies of clustering in self-propelled rods, we hypothesized that M. xanthus cells can align and form clusters through pure mechanical interactions among cells and between cells and substrate. We test this hypothesis using an agent-based simulation framework in which each agent is based on the biophysical model of an individual M. xanthus cell. We show that model agents, under realistic cell flexibility values, can align and form cell clusters but only when periodic reversals of cell directions are suppressed. However, by extending our model to introduce the observed ability of cells to deposit and follow slime trails, we show that effective trail-following leads to clusters in reversing cells. Furthermore, we conclude that mechanical cell alignment combined with slime-trail-following is sufficient to explain the distinct clustering behaviors observed for wild-type a
Recent experiments revealing possible nanoscale electrostatic interactions in force generation at kinetochores for chromosome motions have prompted speculation regarding possible models for interactions between positively charged molecules in kinetochores and negative charge on C-termini near the plus ends of microtubules. A clear picture of how kinetochores establish and maintain a dynamic coupling to microtubules for force generation during the complex motions of mitosis remains elusive. The current paradigm of molecular cell biology requires that specific molecules, or molecular geometries, for force generation be identified. However, it is possible to account for mitotic motions within a classical electrostatics approach in terms of experimentally known cellular electric charge interacting over nanometer distances. These charges are modeled as bound surface and volume continuum charge distributions. Electrostatic consequences of intracellular pH changes during mitosis may provide a master clock for the events of mitosis.
Cell-based, mathematical modeling of collective cell behavior has become a prominent tool in developmental biology. Cell-based models represent individual cells as single particles or as sets of interconnected particles, and predict the collective cell behavior that follows from a set of interaction rules. In particular, vertex-based models are a popular tool for studying the mechanics of confluent, epithelial cell layers. They represent the junctions between three (or sometimes more) cells in confluent tissues as point particles, connected using structural elements that represent the cell boundaries. A disadvantage of these models is that cell-cell interfaces are represented as straight lines. This is a suitable simplification for epithelial tissues, where the interfaces are typically under tension, but this simplification may not be appropriate for mesenchymal tissues or tissues that are under compression, such that the cell-cell boundaries can buckle. In this paper we introduce a variant of VMs in which this and two other limitations of VMs have been resolved. The new model can also be seen as on off-the-lattice generalization of the Cellular Potts Model. It is an extension of t
This article frames the relation between biology and physics by characterizing the former as a subdiscipline rather than a special case of the latter. To do this, we posit biological physics as the science of living matter in contrast to classic biophysics, the study of organismal properties by physical techniques. At the scale of the individual cell, living matter is nonunitary, i.e., not composed of aggregated subunits, and has features (e.g., intracellular organizational arrangements and biomolecular condensates) that are unlike any materials of the nonliving world. In transiently or constitutively multicellular forms (social microorganisms, animals, plants), living matter sustains physical processes that are generic (shared with nonliving matter, e.g., subunit communication by molecular diffusion in cellular slime molds), biogeneric (analogous to nonliving matter but realized through cellular activities, e.g., subunit demixing in animal embryos) or nongeneric (pertaining to sui generis materials, e.g., budding of active solids in plants). This "forms of matter" perspective is philosophically situated in the dialectical materialism of Engels and Hessen and the multilevel physica