Advances in single-cell omics allow for unprecedented insights into the transcription profiles of individual cells. When combined with large-scale perturbation screens, through which specific biological mechanisms can be targeted, these technologies allow for measuring the effect of targeted perturbations on the whole transcriptome. These advances provide an opportunity to better understand the causative role of genes in complex biological processes such as gene regulation, disease progression or cellular development. However, the high-dimensional nature of the data, coupled with the intricate complexity of biological systems renders this task nontrivial. Within the machine learning community, there has been a recent increase of interest in causality, with a focus on adapting established causal techniques and algorithms to handle high-dimensional data. In this perspective, we delineate the application of these methodologies within the realm of single-cell genomics and their challenges. We first present the model that underlies most of current causal approaches to single-cell biology and discuss and challenge the assumptions it entails from the biological point of view. We then iden
Recent advancements in single-cell genomics necessitate precision in gene panel selection to interpret complex biological data effectively. Those methods aim to streamline the analysis of scRNA-seq data by focusing on the most informative genes that contribute significantly to the specific analysis task. Traditional selection methods, which often rely on expert domain knowledge, embedded machine learning models, or heuristic-based iterative optimization, are prone to biases and inefficiencies that may obscure critical genomic signals. Recognizing the limitations of traditional methods, we aim to transcend these constraints with a refined strategy. In this study, we introduce an iterative gene panel selection strategy that is applicable to clustering tasks in single-cell genomics. Our method uniquely integrates results from other gene selection algorithms, providing valuable preliminary boundaries or prior knowledge as initial guides in the search space to enhance the efficiency of our framework. Furthermore, we incorporate the stochastic nature of the exploration process in reinforcement learning (RL) and its capability for continuous optimization through reward-based feedback. Thi
Cell-cell communication is essential for tissue development, regeneration and function, and its disruption can lead to diseases and developmental abnormalities. The revolution of single-cell genomics technologies offers unprecedented insights into cellular identities, opening new avenues to resolve the intricate cellular interactions present in tissue niches. CellPhoneDB is a bioinformatics toolkit designed to infer cell-cell communication by combining a curated repository of bona fide ligand-receptor interactions with a set of computational and statistical methods to integrate them with single-cell genomics data. Importantly, CellPhoneDB captures the multimeric nature of molecular complexes, thus representing cell-cell communication biology faithfully. Here we present CellPhoneDB v5, an updated version of the tool, which offers several new features. Firstly, the repository has been expanded by one-third with the addition of new interactions. These encompass interactions mediated by non-protein ligands such as endocrine hormones and GPCR ligands. Secondly, it includes a differentially expression-based methodology for more tailored interaction queries. Thirdly, it incorporates novel
The crawling motility of many eukaryotic cells is driven by filamentous actin (F-actin), and regulated by a network of signaling proteins and lipids (including small GTPases). The tangle of positive and negative feedback loops gives rise to various experimentally observed dynamic patterns (``actin waves''). Here we consider a recent prototypical model for actin waves in which F-actin exerts negative feedback onto a GTPase. Guided by recent numerical PDE bifurcation analysis in Hughes (2025) and Hughes et al (2026), we explore cell shapes and motility associated with polar, oscillatory, and traveling waves solutions of a mass-conserved partial differential equation (PDE) model. We use Morpheus (cellular Potts) simulations to investigate the implications of such regimes of behavior on the shapes and motion of cells, and on transitions between modes of behavior. The model demonstrates various cell states, including resting (spatially uniform GTPase), polar cells (static ``zones'' of GTPase), and traveling waves along the cell edge. In some parameter regimes, such states can coexist, so that cells can transition from one behavior to another in response to noisy stimuli.
Regulation of cell proliferation is a crucial aspect of tissue development and homeostasis and plays a major role in morphogenesis, wound healing, and tumor invasion. A phenomenon of such regulation is contact inhibition, which describes the dramatic slowing of proliferation, cell migration and individual cell growth when multiple cells are in contact with each other. While many physiological, molecular and genetic factors are known, the mechanism of contact inhibition is still not fully understood. In particular, the relevance of cellular signaling due to interfacial contact for contact inhibition is still debated. Cellular automata (CA) have been employed in the past as numerically efficient mathematical models to study the dynamics of cell ensembles, but they are not suitable to explore the origins of contact inhibition as such agent-based models assume fixed cell sizes. We develop a minimal, data-driven model to simulate the dynamics of planar cell cultures by extending a probabilistic CA to incorporate size changes of individual cells during growth and cell division. We successfully apply this model to previous in-vitro experiments on contact inhibition in epithelial tissue: A
Network models provide a powerful framework for analysing single-cell count data, facilitating the characterisation of cellular identities, disease mechanisms, and developmental trajectories. However, uncertainty modeling in unsupervised learning with genomic data remains insufficiently explored. Conventional clustering methods assign a singular identity to each cell, potentially obscuring transitional states during differentiation or mutation. This study introduces a variational Bayesian framework for clustering and analysing single-cell genomic data, employing a Bayesian Gaussian mixture model to estimate the probabilistic association of cells with distinct clusters. This approach captures cellular transitions, yielding biologically coherent insights into neurogenesis and breast cancer progression. The inferred clustering probabilities enable further analyses, including Differential Expression Analysis and pseudotime analysis. Furthermore, we propose utilising the misclustering rate and Area Under the Curve in clustering scRNA-seq data as an innovative metric to quantitatively evaluate overall clustering performance. This methodological advancement enhances the resolution of sing
Cancer stem cells are controlled by developmental networks that are often topologically indistinguishable from normal, healthy stem cells. The question is why cancer stem cells can be both phenotypically distinct and have morphological effects so different from normal stem cells. The difference between cancer stem cells and normal stem cells lies not in differences their network architecture, but rather in the spatial-temporal locality of their activation in the genome and the resulting expression in the body. The metastatic potential cancer stem cells is not based primarily on their network divergence from normal stem cells, but on non-network based genetic changes that enable the evolution of gene-based phenotypic properties of the cell that permit its escape and travel to other parts of the body. Stem cell network theory allows the precise prediction of stem cell behavioral dynamics and a mathematical description of stem cell proliferation for both normal and cancer stem cells. It indicates that the best therapeutic approach is to tackle the highest order stem cells first, otherwise spontaneous remission of so called cured cancers will always be a danger. Stem cell networks poin
Single-cell technologies have revolutionized biomedical research by enabling scalable measurement of the genome, transcriptome, and proteome of multiple systems at single-cell resolution. Now widely applied to cancer models, these assays offer new insights into tumour heterogeneity, which underlies cancer initiation, progression, and relapse. However, the large quantities of high-dimensional, noisy data produced by single-cell assays can complicate data analysis, obscuring biological signals with technical artefacts. In this review article, we outline the major challenges in analyzing single-cell cancer genomics data and survey the current computational tools available to tackle these. We further outline unsolved problems that we consider major opportunities for future methods development to help interpret the vast quantities of data being generated.
Myxococcus xanthus cells self-organize into aligned groups, clusters, at various stages of their lifecycle. Formation of these clusters is crucial for the complex dynamic multi-cellular behavior of these bacteria. However, the mechanism underlying the cell alignment and clustering is not fully understood. Motivated by studies of clustering in self-propelled rods, we hypothesized that M. xanthus cells can align and form clusters through pure mechanical interactions among cells and between cells and substrate. We test this hypothesis using an agent-based simulation framework in which each agent is based on the biophysical model of an individual M. xanthus cell. We show that model agents, under realistic cell flexibility values, can align and form cell clusters but only when periodic reversals of cell directions are suppressed. However, by extending our model to introduce the observed ability of cells to deposit and follow slime trails, we show that effective trail-following leads to clusters in reversing cells. Furthermore, we conclude that mechanical cell alignment combined with slime-trail-following is sufficient to explain the distinct clustering behaviors observed for wild-type a
The transition from single-cell to multicellular behavior is important in early development but rarely studied. The starvation-induced aggregation of the social amoeba Dictyostelium discoideum into a multicellular slug is known to result from single-cell chemotaxis towards emitted pulses of cyclic adenosine monophosphate (cAMP). However, how exactly do transient short-range chemical gradients lead to coherent collective movement at a macroscopic scale? Here, we use a multiscale model verified by quantitative microscopy to describe wide-ranging behaviors from chemotaxis and excitability of individual cells to aggregation of thousands of cells. To better understand the mechanism of long-range cell-cell communication and hence aggregation, we analyze cell-cell correlations, showing evidence for self-organization at the onset of aggregation (as opposed to following a leader cell). Surprisingly, cell collectives, despite their finite size, show features of criticality known from phase transitions in physical systems. Application of external cAMP perturbations in our simulations near the sensitive critical point allows steering cells into early aggregation and towards certain locations b
We study the prediction of T-cell response for specific given peptides, which could, among other applications, be a crucial step towards the development of personalized cancer vaccines. It is a challenging task due to limited, heterogeneous training data featuring a multi-domain structure; such data entail the danger of shortcut learning, where models learn general characteristics of peptide sources, such as the source organism, rather than specific peptide characteristics associated with T-cell response. Using a transformer model for T-cell response prediction, we show that the danger of inflated predictive performance is not merely theoretical but occurs in practice. Consequently, we propose a domain-aware evaluation scheme. We then study different transfer learning techniques to deal with the multi-domain structure and shortcut learning. We demonstrate a per-source fine tuning approach to be effective across a wide range of peptide sources and further show that our final model is competitive with existing state-of-the-art approaches for predicting T-cell responses for human peptides.
We generated a computational approach to analyze the biomechanics of epithelial cell aggregates, either island or stripes or entire monolayers, that combines both vertex and contact-inhibition-of-locomotion models to include both cell-cell and cell-substrate adhesion. Examination of the distribution of cell protrusions (adhesion to the substrate) in the model predicted high order profiles of cell organization that agree with those previously seen experimentally. Cells acquired an asymmetric distribution of basal protrusions, traction forces and apical aspect ratios that decreased when moving from the edge to the island center. Our in silico analysis also showed that tension on cell-cell junctions and apical stress is not homogeneous across the island. Instead, these parameters are higher at the island center and scales up with island size, which we confirmed experimentally using laser ablation assays and immunofluorescence. Without formally being a 3-dimensional model, our approach has the minimal elements necessary to reproduce the distribution of cellular forces and mechanical crosstalk as well as distribution of principal stress in cells within epithelial cell aggregates. By mak
Long-range dependencies are critical for understanding genomic structure and function, yet most conventional methods struggle with them. Widely adopted transformer-based models, while excelling at short-context tasks, are limited by the attention module's quadratic computational complexity and inability to extrapolate to sequences longer than those seen in training. In this work, we explore State Space Models (SSMs) as a promising alternative by benchmarking two SSM-inspired architectures, Caduceus and Hawk, on long-range genomics modeling tasks under conditions parallel to a 50M parameter transformer baseline. We discover that SSMs match transformer performance and exhibit impressive zero-shot extrapolation across multiple tasks, handling contexts 10 to 100 times longer than those seen during training, indicating more generalizable representations better suited for modeling the long and complex human genome. Moreover, we demonstrate that these models can efficiently process sequences of 1M tokens on a single GPU, allowing for modeling entire genomic regions at once, even in labs with limited compute. Our findings establish SSMs as efficient and scalable for long-context genomic an
Organisms across all domains of life regulate the size of their cells. However, the means by which this is done is poorly understood. We study two abstracted "molecular" models for size regulation: inhibitor dilution and initiator accumulation. We apply the models to two settings: bacteria like Escherichia coli, that grow fully before they set a division plane and divide into two equally sized cells, and cells that form a bud early in the cell division cycle, confine new growth to that bud, and divide at the connection between that bud and the mother cell, like the budding yeast Saccharomyces cerevisiae. In budding cells, delaying cell division until buds reach the same size as their mother leads to very weak size control, with average cell size and standard deviation of cell size increasing over time and saturating up to 100-fold higher than those values for cells that divide when the bud is still substantially smaller than its mother. In budding yeast, both inhibitor dilution or initiator accumulation models are consistent with the observation that the daughters of diploid cells add a constant volume before they divide. This adder behavior has also been observed in bacteria. We f
Bacteria are able to maintain a narrow distribution of cell sizes by regulating the timing of cell divisions. In rich nutrient conditions, cells divide much faster than their chromosomes replicate. This implies that cells maintain multiple rounds of chromosome replication per cell division by regulating the timing of chromosome replications. Here, we show that both cell size and chromosome replication may be simultaneously regulated by the long-standing initiator accumulation strategy. The strategy proposes that initiators are produced in proportion to the volume increase and is accumulated at each origin of replication, and chromosome replication is initiated when a critical amount per origin has accumulated. We show that this model maps to the incremental model of size control, which was previously shown to reproduce experimentally observed correlations between various events in the cell cycle and explains the exponential dependence of cell size on the growth rate of the cell. Furthermore, we show that this model also leads to the efficient regulation of the timing of initiation and the number of origins consistent with existing experimental results.
Genome replication, a key process for a cell, relies on stochastic initiation by replication origins, causing a variability of replication timing from cell to cell. While stochastic models of eukaryotic replication are widely available, the link between the key parameters and overall replication timing has not been addressed systematically.We use a combined analytical and computational approach to calculate how positions and strength of many origins lead to a given cell-to-cell variability of total duration of the replication of a large region, a chromosome or the entire genome.Specifically, the total replication timing can be framed as an extreme-value problem, since it is due to the last region that replicates in each cell. Our calculations identify two regimes based on the spread between characteristic completion times of all inter-origin regions of a genome. For widely different completion times, timing is set by the single specific region that is typically the last to replicate in all cells. Conversely, when the completion time of all regions are comparable,an extreme-value estimate shows that the cell-to-cell variability of genome replication timing has universal properties.
The different cell types in a living organism acquire their identity through the process of cell differentiation in which the multipotent progenitor cells differentiate into distinct cell types. Experimental evidence and analysis of large-scale microarray data establish the key role played by a two-gene motif in cell differentiation in a number of cell systems. The two genes express transcription factors which repress each other's expression and autoactivate their own production. A number of theoretical models have recently been proposed based on the two-gene motif to provide a physical understanding of how cell differentiation occurs. In this paper, we study a simple model of cell differentiation which assumes no cooperativity in the regulation of gene expression by the transcription factors. The latter repress each other's activity directly through DNA binding and indirectly through the formation of heterodimers. We specifically investigate how deterministic processes combined with stochasticity contribute in bringing about cell differentiation. The deterministic dynamics of our model give rise to a supercritical pitchfork bifurcation from an undifferentiated stable steady state
Rare diseases are collectively common, affecting approximately one in twenty individuals worldwide. In recent years, rapid progress has been made in rare disease diagnostics due to advances in DNA sequencing, development of new computational and experimental approaches to prioritize genes and genetic variants, and increased global exchange of clinical and genetic data. However, more than half of individuals suspected to have a rare disease lack a genetic diagnosis. The Genomics Research to Elucidate the Genetics of Rare Diseases (GREGoR) Consortium was initiated to study thousands of challenging rare disease cases and families and apply, standardize, and evaluate emerging genomics technologies and analytics to accelerate their adoption in clinical practice. Further, all data generated, currently representing ~7500 individuals from ~3000 families, is rapidly made available to researchers worldwide via the Genomic Data Science Analysis, Visualization, and Informatics Lab-space (AnVIL) to catalyze global efforts to develop approaches for genetic diagnoses in rare diseases (https://gregorconsortium.org/data). The majority of these families have undergone prior clinical genetic testing
The last decade has witnessed a rapid growth in understanding of the pivotal roles of mechanical stresses and physical forces in cell biology. As a result an integrated view of cell biology is evolving, where genetic and molecular features are scrutinized hand in hand with physical and mechanical characteristics of cells. Physics of liquid crystals has emerged as a burgeoning new frontier in cell biology over the past few years, fueled by an increasing identification of orientational order and topological defects in cell biology, spanning scales from subcellular filaments to individual cells and multicellular tissues. Here, we provide an account of most recent findings and developments together with future promises and challenges in this rapidly evolving interdisciplinary research direction.
Maintenance and regeneration of adult tissues rely on the self-renewal of stem cells. Regeneration without over-proliferation requires precise regulation of the stem cell proliferation and differentiation rates. The nature of such regulatory mechanisms in different tissues, and how to incorporate them in models of stem cell population dynamics, is incompletely understood. The critical birth-death (CBD) process is widely used to model stem cell populations, capturing key phenomena, such as scaling laws in clone size distributions. However, the CBD process neglects regulatory mechanisms. Here, we propose the birth-death process with volume exclusion (vBD), a variation of the birth-death process that considers crowding effects, such as may arise due to limited space in a stem cell niche. While the deterministic rate equations predict a single non-trivial attracting steady state, the master equation predicts extinction and transient distributions of stem cell numbers with three possible behaviours: long-lived quasi-steady state, and short-lived bimodal or unimodal distributions. In all cases, we approximate solutions to the vBD master equation using a renormalised system-size expansion