Recent studies have demonstrated the feasibility of modeling single-cell data as natural languages and the potential of leveraging powerful large language models (LLMs) for understanding cell biology. However, a comprehensive evaluation of LLMs' performance on language-driven single-cell analysis tasks still remains unexplored. Motivated by this challenge, we introduce CellVerse, a unified language-centric question-answering benchmark that integrates four types of single-cell multi-omics data and encompasses three hierarchical levels of single-cell analysis tasks: cell type annotation (cell-level), drug response prediction (drug-level), and perturbation analysis (gene-level). Going beyond this, we systematically evaluate the performance across 14 open-source and closed-source LLMs ranging from 160M to 671B on CellVerse. Remarkably, the experimental results reveal: (1) Existing specialist models (C2S-Pythia) fail to make reasonable decisions across all sub-tasks within CellVerse, while generalist models such as Qwen, Llama, GPT, and DeepSeek family models exhibit preliminary understanding capabilities within the realm of cell biology. (2) The performance of current LLMs falls short
Nearly all cell models explicitly or implicitly deal with the biophysical constraints that must be respected for life to persist. Despite this, there is almost no systematicity in how these constraints are implemented, and we lack a principled understanding of how cellular dynamics interact with them and how they originate in actual biology. Computational cell biology will only overcome these concerns once it treats the life-death boundary as a central concept, creating a theory of cellular viability. We lay the foundation for such a development by demonstrating how specific geometric structures can separate regions of qualitatively similar survival outcomes in our models, offering new global organizing principles for cell fate. We also argue that idealized models of emergent individuals offer a tractable way to begin understanding life's intrinsically generated limits.
Understanding the biological mechanisms of disease is crucial for medicine, and in particular, for drug discovery. AI-powered analysis of genome-scale biological data holds great potential in this regard. The increasing availability of single-cell RNA sequencing data has enabled the development of large foundation models for disease biology. However, existing foundation models only modestly improve over task-specific models in downstream applications. Here, we explored two avenues for improving single-cell foundation models. First, we scaled the pre-training data to a diverse collection of 116 million cells, which is larger than those used by previous models. Second, we leveraged the availability of large-scale biological annotations as a form of supervision during pre-training. We trained the \model family of models comprising six transformer-based state-of-the-art single-cell foundation models with 70 million, 160 million, and 400 million parameters. We vetted our models on several downstream evaluation tasks, including identifying the underlying disease state of held-out donors not seen during training, distinguishing between diseased and healthy cells for disease conditions and
The last decade has witnessed a rapid growth in understanding of the pivotal roles of mechanical stresses and physical forces in cell biology. As a result an integrated view of cell biology is evolving, where genetic and molecular features are scrutinized hand in hand with physical and mechanical characteristics of cells. Physics of liquid crystals has emerged as a burgeoning new frontier in cell biology over the past few years, fueled by an increasing identification of orientational order and topological defects in cell biology, spanning scales from subcellular filaments to individual cells and multicellular tissues. Here, we provide an account of most recent findings and developments together with future promises and challenges in this rapidly evolving interdisciplinary research direction.
Modern biology and biomedicine are undergoing a big-data explosion needing advanced computational algorithms to extract mechanistic insights on the physiological state of living cells. We present the motivation for the Cell Physiome: a framework and approach for creating, sharing, and using biophysics-based computational models of single cell physiology. Using examples in calcium signaling, bioenergetics, and endosomal trafficking, we highlight the need for spatially detailed, biophysics-based computational models to uncover new mechanisms underlying cell biology. We review progress and challenges to date towards creating cell physiome models. We then introduce bond graphs as an efficient way to create cell physiome models that integrate chemical, mechanical, electromagnetic, and thermal processes while maintaining mass and energy balance. Bond graphs enhance modularization and re-usability of computational models of cells at scale. We conclude with a look forward into steps that will help fully realize this exciting new field of mechanistic biomedical data science.
The cell biology literature is littered with erroneously tiny P values, often the result of evaluating individual cells as independent samples. Because readers use P values and error bars to infer whether a reported difference would likely recur if the experiment were repeated, the sample size N used for statistical tests should actually be the number of times an experiment is performed, not the number of cells (or subcellular structures) analyzed across all experiments. P values calculated using the number of cells do not reflect the reproducibility of the result and are thus highly misleading. To help authors avoid this mistake, we provide examples and practical tutorials for creating figures that communicate both the cell-level variability and the experimental reproducibility.
In the middle of the last century, it has been known that neural stem cells (NSCs) play a key role in regenerative medicine to cure the neurodegenerative disease. This review article covers about the introduction to neural stem cell biology and the isolation, differentiation and transplantation methods/techniques of neural stem cells. The neural stem cells can be transplanted into the human brain in the future to replace the damaged and dead neurons. The highly limited access to embryonic stem cells and ethical issues have escalated the search for other NSC sources. The developing technologies are indicating that it can be achieved before the end of this century. In addition, the differentiation and the maturation of NSCs can artificially accelerate by modern methods.
100 years after Smoluchowski introduces his approach to stochastic processes, they are now at the basis of mathematical and physical modeling in cellular biology: they are used for example to analyse and to extract features from large number (tens of thousands) of single molecular trajectories or to study the diffusive motion of molecules, proteins or receptors. Stochastic modeling is a new step in large data analysis that serves extracting cell biology concepts. We review here the Smoluchowski's approach to stochastic processes and provide several applications for coarse-graining diffusion, studying polymer models for understanding nuclear organization and finally, we discuss the stochastic jump dynamics of telomeres across cell division and stochastic gene regulation.
Cellular biology exists embedded in a world dominated by random dynamics and chance. Many vital molecules and pieces of cellular machinery diffuse within cells, moving along random trajectories as they collide with the other biomolecular inhabitants of the cell. Cellular components may block each other's progress, be produced or degraded at random times, and become unevenly separated as cells grow and divide. Cellular behaviour, including important features of stem cells, tumours and infectious bacteria, is profoundly influenced by the chaos which is the environment within the cell walls. Here we will look at some important causes and effects of randomness in cellular biology, and some ways in which researchers, helped by the vast amounts of data that are now flowing in, have made progress in describing the randomness of nature.
Cell-cell communication is essential for tissue development, regeneration and function, and its disruption can lead to diseases and developmental abnormalities. The revolution of single-cell genomics technologies offers unprecedented insights into cellular identities, opening new avenues to resolve the intricate cellular interactions present in tissue niches. CellPhoneDB is a bioinformatics toolkit designed to infer cell-cell communication by combining a curated repository of bona fide ligand-receptor interactions with a set of computational and statistical methods to integrate them with single-cell genomics data. Importantly, CellPhoneDB captures the multimeric nature of molecular complexes, thus representing cell-cell communication biology faithfully. Here we present CellPhoneDB v5, an updated version of the tool, which offers several new features. Firstly, the repository has been expanded by one-third with the addition of new interactions. These encompass interactions mediated by non-protein ligands such as endocrine hormones and GPCR ligands. Secondly, it includes a differentially expression-based methodology for more tailored interaction queries. Thirdly, it incorporates novel
The crawling motility of many eukaryotic cells is driven by filamentous actin (F-actin), and regulated by a network of signaling proteins and lipids (including small GTPases). The tangle of positive and negative feedback loops gives rise to various experimentally observed dynamic patterns (``actin waves''). Here we consider a recent prototypical model for actin waves in which F-actin exerts negative feedback onto a GTPase. Guided by recent numerical PDE bifurcation analysis in Hughes (2025) and Hughes et al (2026), we explore cell shapes and motility associated with polar, oscillatory, and traveling waves solutions of a mass-conserved partial differential equation (PDE) model. We use Morpheus (cellular Potts) simulations to investigate the implications of such regimes of behavior on the shapes and motion of cells, and on transitions between modes of behavior. The model demonstrates various cell states, including resting (spatially uniform GTPase), polar cells (static ``zones'' of GTPase), and traveling waves along the cell edge. In some parameter regimes, such states can coexist, so that cells can transition from one behavior to another in response to noisy stimuli.
Epigenetic Tracking is a mathematical model of biological cells, originally conceived to study embryonic development. Computer simulations proved the capacity of the model to generate complex 3-dimensional cellular structures, and the potential to reproduce the complexity typical of living beings. The most distinctive feature of this model is the presence in the body of a homogeneous distribution of stem cells, which are dinamically and continuously created during development from non-stem cells and reside in niches. Embryonic stem cells orchestrate early developmental events, adult stem cells direct late developmental and regeneration events, ageing stem cells cause ageing and cancer stem cells are responsible for cancer growth. The conceptual backbone provided by Epigenetic Tracking brings together a wide range of biological phenomena: for this reason, we think it can be proposed as a general model for multicellular biology. Despite, or perhaps due to its theoretical origin, the model allowed us to make predictions relevant to very diverse fields of biology, such as transposable elements, and cancer-related patterns of gene mutations. This paper contains a summary of the model an
Regulation of cell proliferation is a crucial aspect of tissue development and homeostasis and plays a major role in morphogenesis, wound healing, and tumor invasion. A phenomenon of such regulation is contact inhibition, which describes the dramatic slowing of proliferation, cell migration and individual cell growth when multiple cells are in contact with each other. While many physiological, molecular and genetic factors are known, the mechanism of contact inhibition is still not fully understood. In particular, the relevance of cellular signaling due to interfacial contact for contact inhibition is still debated. Cellular automata (CA) have been employed in the past as numerically efficient mathematical models to study the dynamics of cell ensembles, but they are not suitable to explore the origins of contact inhibition as such agent-based models assume fixed cell sizes. We develop a minimal, data-driven model to simulate the dynamics of planar cell cultures by extending a probabilistic CA to incorporate size changes of individual cells during growth and cell division. We successfully apply this model to previous in-vitro experiments on contact inhibition in epithelial tissue: A
The discovery of general principles underlying the complexity and diversity of cellular and developmental systems is a central and long-standing aim of biology. Whilst new technologies collect data at an ever-accelerating rate, there is growing concern that conceptual progress is not keeping pace. We contend that this is due to a paucity of appropriate conceptual frameworks to serve as a basis for general theories of mesoscale biological phenomena. In exploring this issue, we have developed a foundation for one such framework, termed the Core and Periphery (C&P) hypothesis, which reveals hidden generality across the diverse and complex behaviors exhibited by cells and tissues. Here, we present the C&P concept, provide examples of its applicability across multiple scales, argue its consistency with evolution, and discuss key implications and open questions. We propose that the C&P hypothesis could unlock new avenues of conceptual progress in cell and developmental biology.
This article frames the relation between biology and physics by characterizing the former as a subdiscipline rather than a special case of the latter. To do this, we posit biological physics as the science of living matter in contrast to classic biophysics, the study of organismal properties by physical techniques. At the scale of the individual cell, living matter is nonunitary, i.e., not composed of aggregated subunits, and has features (e.g., intracellular organizational arrangements and biomolecular condensates) that are unlike any materials of the nonliving world. In transiently or constitutively multicellular forms (social microorganisms, animals, plants), living matter sustains physical processes that are generic (shared with nonliving matter, e.g., subunit communication by molecular diffusion in cellular slime molds), biogeneric (analogous to nonliving matter but realized through cellular activities, e.g., subunit demixing in animal embryos) or nongeneric (pertaining to sui generis materials, e.g., budding of active solids in plants). This "forms of matter" perspective is philosophically situated in the dialectical materialism of Engels and Hessen and the multilevel physica
Cell-based, mathematical modeling of collective cell behavior has become a prominent tool in developmental biology. Cell-based models represent individual cells as single particles or as sets of interconnected particles, and predict the collective cell behavior that follows from a set of interaction rules. In particular, vertex-based models are a popular tool for studying the mechanics of confluent, epithelial cell layers. They represent the junctions between three (or sometimes more) cells in confluent tissues as point particles, connected using structural elements that represent the cell boundaries. A disadvantage of these models is that cell-cell interfaces are represented as straight lines. This is a suitable simplification for epithelial tissues, where the interfaces are typically under tension, but this simplification may not be appropriate for mesenchymal tissues or tissues that are under compression, such that the cell-cell boundaries can buckle. In this paper we introduce a variant of VMs in which this and two other limitations of VMs have been resolved. The new model can also be seen as on off-the-lattice generalization of the Cellular Potts Model. It is an extension of t
Cellular heterogeneity is an immanent property of biological systems that covers very different aspects of life ranging from genetic diversity to cell-to-cell variability driven by stochastic molecular interactions, and noise induced cell differentiation. Here, we review recent developments in characterizing cellular heterogeneity by distributions and argue that understanding multicellular life requires the analysis of heterogeneity dynamics at single cell resolution by integrative approaches that combine methods from non-equilibrium statistical physics, information theory and omics biology.
The transition from single-cell to multicellular behavior is important in early development but rarely studied. The starvation-induced aggregation of the social amoeba Dictyostelium discoideum into a multicellular slug is known to result from single-cell chemotaxis towards emitted pulses of cyclic adenosine monophosphate (cAMP). However, how exactly do transient short-range chemical gradients lead to coherent collective movement at a macroscopic scale? Here, we use a multiscale model verified by quantitative microscopy to describe wide-ranging behaviors from chemotaxis and excitability of individual cells to aggregation of thousands of cells. To better understand the mechanism of long-range cell-cell communication and hence aggregation, we analyze cell-cell correlations, showing evidence for self-organization at the onset of aggregation (as opposed to following a leader cell). Surprisingly, cell collectives, despite their finite size, show features of criticality known from phase transitions in physical systems. Application of external cAMP perturbations in our simulations near the sensitive critical point allows steering cells into early aggregation and towards certain locations b
Myxococcus xanthus cells self-organize into aligned groups, clusters, at various stages of their lifecycle. Formation of these clusters is crucial for the complex dynamic multi-cellular behavior of these bacteria. However, the mechanism underlying the cell alignment and clustering is not fully understood. Motivated by studies of clustering in self-propelled rods, we hypothesized that M. xanthus cells can align and form clusters through pure mechanical interactions among cells and between cells and substrate. We test this hypothesis using an agent-based simulation framework in which each agent is based on the biophysical model of an individual M. xanthus cell. We show that model agents, under realistic cell flexibility values, can align and form cell clusters but only when periodic reversals of cell directions are suppressed. However, by extending our model to introduce the observed ability of cells to deposit and follow slime trails, we show that effective trail-following leads to clusters in reversing cells. Furthermore, we conclude that mechanical cell alignment combined with slime-trail-following is sufficient to explain the distinct clustering behaviors observed for wild-type a
In a recent paper, Wilmes et al. demonstrated a qualitative integration of omics data streams to gain a mechanistic understanding of cyclosporine A toxicity. One of their major conclusions was that cyclosporine A strongly activates the nuclear factor (erythroid-derived 2)-like 2 pathway (Nrf2) in renal proximal tubular epithelial cells exposed in vitro. We pursue here the analysis of those data with a quantitative integration of omics data with a differential equation model of the Nrf2 pathway. That was done in two steps: (i) Modeling the in vitro pharmacokinetics of cyclosporine A (exchange between cells, culture medium and vial walls) with a minimal distribution model. (ii) Modeling the time course of omics markers in response to cyclosporine A exposure at the cell level with a coupled PK-systems biology model. Posterior statistical distributions of the parameter values were obtained by Markov chain Monte Carlo sampling. Data were well simulated, and the known in vitro toxic effect EC50 was well matched by model predictions. The integration of in vitro pharmacokinetics and systems biology modeling gives us a quantitative insight into mechanisms of cyclosporine A oxidative-stress