Cell type annotation is a key task in analyzing the heterogeneity of single-cell RNA sequencing data. Although recent foundation models automate this process, they typically annotate cells independently, without considering batch-level cellular context or providing explanatory reasoning. In contrast, human experts often annotate distinct cell types for different cell clusters based on their domain knowledge. To mimic this workflow, we introduce the CellPuzzles task, where the objective is to assign unique cell types to a batch of cells. This benchmark spans diverse tissues, diseases, and donor conditions, and requires reasoning across the batch-level cellular context to ensure label uniqueness. We find that off-the-shelf large language models (LLMs) struggle on CellPuzzles, with the best baseline (OpenAI's o1) achieving only 19.0% batch-level accuracy. To fill this gap, we propose Cell-o1, a 7B LLM trained via supervised fine-tuning on distilled reasoning traces, followed by reinforcement learning with batch-level rewards. Cell-o1 achieves state-of-the-art performance, outperforming o1 by over 73% and generalizing well across contexts. Further analysis of training dynamics and reas
Quantifying the size of cell populations is crucial for understanding biological processes such as growth, injury repair, and disease progression. Often, experimental data offer information in the form of relative frequencies of distinct cell types, rather than absolute cell counts. This emphasizes the need to devise effective strategies for estimating absolute cell quantities from fraction data. In response to this challenge, we present two computational approaches grounded in stochastic cell population models: the first-order moment method (FOM) and the second-order moment method (SOM). These methods explicitly establish mathematical mappings from cell fraction to cell population size using moment equations of the stochastic models. Notably, our investigation demonstrates that the SOM method obviates the requirement for a priori knowledge of the initial population size, highlighting the utility of incorporating variance details from cell proportions. The robustness of both the FOM and SOM methods was analyzed from different perspectives. Additionally, we extended the application of the FOM and SOM methods to various biological mechanisms within the context of cell plasticity mode
Cell-cell communication is essential for tissue development, regeneration and function, and its disruption can lead to diseases and developmental abnormalities. The revolution of single-cell genomics technologies offers unprecedented insights into cellular identities, opening new avenues to resolve the intricate cellular interactions present in tissue niches. CellPhoneDB is a bioinformatics toolkit designed to infer cell-cell communication by combining a curated repository of bona fide ligand-receptor interactions with a set of computational and statistical methods to integrate them with single-cell genomics data. Importantly, CellPhoneDB captures the multimeric nature of molecular complexes, thus representing cell-cell communication biology faithfully. Here we present CellPhoneDB v5, an updated version of the tool, which offers several new features. Firstly, the repository has been expanded by one-third with the addition of new interactions. These encompass interactions mediated by non-protein ligands such as endocrine hormones and GPCR ligands. Secondly, it includes a differentially expression-based methodology for more tailored interaction queries. Thirdly, it incorporates novel
Regulation of cell proliferation is a crucial aspect of tissue development and homeostasis and plays a major role in morphogenesis, wound healing, and tumor invasion. A phenomenon of such regulation is contact inhibition, which describes the dramatic slowing of proliferation, cell migration and individual cell growth when multiple cells are in contact with each other. While many physiological, molecular and genetic factors are known, the mechanism of contact inhibition is still not fully understood. In particular, the relevance of cellular signaling due to interfacial contact for contact inhibition is still debated. Cellular automata (CA) have been employed in the past as numerically efficient mathematical models to study the dynamics of cell ensembles, but they are not suitable to explore the origins of contact inhibition as such agent-based models assume fixed cell sizes. We develop a minimal, data-driven model to simulate the dynamics of planar cell cultures by extending a probabilistic CA to incorporate size changes of individual cells during growth and cell division. We successfully apply this model to previous in-vitro experiments on contact inhibition in epithelial tissue: A
Cell migration is essential for regulating many biological processes in physiological or pathological conditions, including embryonic development and cancer invasion. In vitro and in silico studies suggest that collective cell migration is associated with some biomechanical particularities, such as restructuring of extracellular matrix, stress and force distribution profiles, and reorganization of cytoskeleton. Therefore, the phenomenon could be understood by an in-depth study of cells' behavior determinants, including but not limited to mechanical cues from the environment and from fellow travelers. This review article aims to cover the recent development of experimental and computational methods for studying the biomechanics of collective cell migration during cancer progression and invasion. We also summarized the tested hypotheses regarding the mechanism underlying collective cell migration enabled by these methods. Together, the paper enables a broad overview on the methods and tools currently available to unravel the biophysical mechanisms pertinent to cell collective migration, as well as providing perspectives on future development towards eventually deciphering the key mec
The crawling motility of many eukaryotic cells is driven by filamentous actin (F-actin), and regulated by a network of signaling proteins and lipids (including small GTPases). The tangle of positive and negative feedback loops gives rise to various experimentally observed dynamic patterns (``actin waves''). Here we consider a recent prototypical model for actin waves in which F-actin exerts negative feedback onto a GTPase. Guided by recent numerical PDE bifurcation analysis in Hughes (2025) and Hughes et al (2026), we explore cell shapes and motility associated with polar, oscillatory, and traveling waves solutions of a mass-conserved partial differential equation (PDE) model. We use Morpheus (cellular Potts) simulations to investigate the implications of such regimes of behavior on the shapes and motion of cells, and on transitions between modes of behavior. The model demonstrates various cell states, including resting (spatially uniform GTPase), polar cells (static ``zones'' of GTPase), and traveling waves along the cell edge. In some parameter regimes, such states can coexist, so that cells can transition from one behavior to another in response to noisy stimuli.
Cancer stem cells are controlled by developmental networks that are often topologically indistinguishable from normal, healthy stem cells. The question is why cancer stem cells can be both phenotypically distinct and have morphological effects so different from normal stem cells. The difference between cancer stem cells and normal stem cells lies not in differences their network architecture, but rather in the spatial-temporal locality of their activation in the genome and the resulting expression in the body. The metastatic potential cancer stem cells is not based primarily on their network divergence from normal stem cells, but on non-network based genetic changes that enable the evolution of gene-based phenotypic properties of the cell that permit its escape and travel to other parts of the body. Stem cell network theory allows the precise prediction of stem cell behavioral dynamics and a mathematical description of stem cell proliferation for both normal and cancer stem cells. It indicates that the best therapeutic approach is to tackle the highest order stem cells first, otherwise spontaneous remission of so called cured cancers will always be a danger. Stem cell networks poin
Background and objective: Cell migration is essential for many biological phenomena with direct impact on human health and disease. One conventional approach to study cell migration involves the quantitative analysis of individual cell trajectories recorded by time-lapse video microscopy. Dedicated software tools exist to assist the automated or semi-automated tracking of cells and translate these into coordinate positions along time. However, cell biologists usually bump into the difficulty of plotting and computing these data sets into biologically meaningful figures and metrics. Methods: This report describes MigraR, an intuitive graphical user interface executed from the RStudioTM (via the R package Shiny), which greatly simplifies the task of translating coordinate positions of moving cells into measurable parameters of cell migration (velocity, straightness, and direction of movement), as well as of plotting cell trajectories and migration metrics. One innovative function of this interface is that it allows users to refine their data sets by setting limits based on time, velocity and straightness. Results: MigraR was tested on different data to assess its applicability. Inten
Myxococcus xanthus cells self-organize into aligned groups, clusters, at various stages of their lifecycle. Formation of these clusters is crucial for the complex dynamic multi-cellular behavior of these bacteria. However, the mechanism underlying the cell alignment and clustering is not fully understood. Motivated by studies of clustering in self-propelled rods, we hypothesized that M. xanthus cells can align and form clusters through pure mechanical interactions among cells and between cells and substrate. We test this hypothesis using an agent-based simulation framework in which each agent is based on the biophysical model of an individual M. xanthus cell. We show that model agents, under realistic cell flexibility values, can align and form cell clusters but only when periodic reversals of cell directions are suppressed. However, by extending our model to introduce the observed ability of cells to deposit and follow slime trails, we show that effective trail-following leads to clusters in reversing cells. Furthermore, we conclude that mechanical cell alignment combined with slime-trail-following is sufficient to explain the distinct clustering behaviors observed for wild-type a
The transition from single-cell to multicellular behavior is important in early development but rarely studied. The starvation-induced aggregation of the social amoeba Dictyostelium discoideum into a multicellular slug is known to result from single-cell chemotaxis towards emitted pulses of cyclic adenosine monophosphate (cAMP). However, how exactly do transient short-range chemical gradients lead to coherent collective movement at a macroscopic scale? Here, we use a multiscale model verified by quantitative microscopy to describe wide-ranging behaviors from chemotaxis and excitability of individual cells to aggregation of thousands of cells. To better understand the mechanism of long-range cell-cell communication and hence aggregation, we analyze cell-cell correlations, showing evidence for self-organization at the onset of aggregation (as opposed to following a leader cell). Surprisingly, cell collectives, despite their finite size, show features of criticality known from phase transitions in physical systems. Application of external cAMP perturbations in our simulations near the sensitive critical point allows steering cells into early aggregation and towards certain locations b
We define a model for random (abstract) cell complexes (CCs), similiar to the well-known Erdős-Rényi model for graphs and its extensions for simplicial complexes. To build a random cell complex, we first draw from an Erdős-Rényi graph, and consecutively augment the graph with cells for each dimension with a specified probability. As the number of possible cells increases combinatorially -- e.g., 2-cells can be represented as cycles, or permutations -- we derive an approximate sampling algorithm for this model limited to two-dimensional abstract cell complexes. As a basis for this algorithm, we first introduce a spanning-tree-based method that samples simple cycles and allows the efficient approximation of various properties, most notably the probability of occurence of a given cycle. This approximation is of independent interest as it enables the approximation of a wide variety of cycle-related graph statistics using importance sampling. We use this to approximate the number of cycles of a given length on a graph, allowing us to calculate the sampling probability to arrive at a desired expected number of sampled 2-cells. The probability approximation also trivially leads to a sampl
Rapid Single Flux Quantum (RSFQ) circuits are the most evolved superconductor logic family. However, the need to clock each cell and the deep pipeline causes a complex clock network with a large skew. This results in lower throughput and high latency in RSFQ. This work introduces an asynchronous RSFQ cell library that incorporates the α-cell, enabling bidirectional signal paths in RSFQ circuits. The α-cell mitigates the need for a large clock network by allowing reverse signal flow, minimizing routing, and enabling compact circuit designs. We demonstrate the library's reliability and efficiency by analog simulations and using in-house optimization tools. The asynchronous reset RSFQ (AR-SFQ) will enable efficient implementation of scalable, high-performance computing frameworks, such as state machines, neuromorphic computing, and higher fan-in circuits.
Purpose: We investigated the utilization of privacy-preserving, locally-deployed, open-source Large Language Models (LLMs) to extract diagnostic information from free-text cardiovascular magnetic resonance (CMR) reports. Materials and Methods: We evaluated nine open-source LLMs on their ability to identify diagnoses and classify patients into various cardiac diagnostic categories based on descriptive findings in 109 clinical CMR reports. Performance was quantified using standard classification metrics including accuracy, precision, recall, and F1 score. We also employed confusion matrices to examine patterns of misclassification across models. Results: Most open-source LLMs demonstrated exceptional performance in classifying reports into different diagnostic categories. Google's Gemma2 model achieved the highest average F1 score of 0.98, followed by Qwen2.5:32B and DeepseekR1-32B with F1 scores of 0.96 and 0.95, respectively. All other evaluated models attained average scores above 0.93, with Mistral and DeepseekR1-7B being the only exceptions. The top four LLMs outperformed our board-certified cardiologist (F1 score of 0.94) across all evaluation metrics in analyzing CMR reports.
We generated a computational approach to analyze the biomechanics of epithelial cell aggregates, either island or stripes or entire monolayers, that combines both vertex and contact-inhibition-of-locomotion models to include both cell-cell and cell-substrate adhesion. Examination of the distribution of cell protrusions (adhesion to the substrate) in the model predicted high order profiles of cell organization that agree with those previously seen experimentally. Cells acquired an asymmetric distribution of basal protrusions, traction forces and apical aspect ratios that decreased when moving from the edge to the island center. Our in silico analysis also showed that tension on cell-cell junctions and apical stress is not homogeneous across the island. Instead, these parameters are higher at the island center and scales up with island size, which we confirmed experimentally using laser ablation assays and immunofluorescence. Without formally being a 3-dimensional model, our approach has the minimal elements necessary to reproduce the distribution of cellular forces and mechanical crosstalk as well as distribution of principal stress in cells within epithelial cell aggregates. By mak
Single-cell technologies are revolutionizing the entire field of biology. The large volumes of data generated by single-cell technologies are high-dimensional, sparse, heterogeneous, and have complicated dependency structures, making analyses using conventional machine learning approaches challenging and impractical. In tackling these challenges, deep learning often demonstrates superior performance compared to traditional machine learning methods. In this work, we give a comprehensive survey on deep learning in single-cell analysis. We first introduce background on single-cell technologies and their development, as well as fundamental concepts of deep learning including the most popular deep architectures. We present an overview of the single-cell analytic pipeline pursued in research applications while noting divergences due to data sources or specific applications. We then review seven popular tasks spanning through different stages of the single-cell analysis pipeline, including multimodal integration, imputation, clustering, spatial domain identification, cell-type deconvolution, cell segmentation, and cell-type annotation. Under each task, we describe the most recent developm
Statistical and mathematical modeling are crucial to describe, interpret, compare and predict the behavior of complex biological systems including the organization of hematopoietic stem and progenitor cells in the bone marrow environment. The current prominence of high-resolution and live-cell imaging data provides an unprecedented opportunity to study the spatiotemporal dynamics of these cells within their stem cell niche and learn more about aberrant, but also unperturbed, normal hematopoiesis. However, this requires careful quantitative statistical analysis of the spatial and temporal behavior of cells and the interaction with their microenvironment. Moreover, such quantification is a prerequisite for the construction of hypothesis-driven mathematical models that can provide mechanistic explanations by generating spatiotemporal dynamics that can be directly compared to experimental observations. Here, we provide a brief overview of statistical methods in analyzing spatial distribution of cells, cell motility, cell shapes and cellular genealogies. We also describe cell- based modeling formalisms that allow researchers to simulate emergent behavior in a multicellular system based
Quian Quiroga et al. [Nature 435, 1102 (2005)] have recently discovered neurons that appear to have the characteristics of grandmother (GM) cells. Here we quantitatively assess the compatibility of their data with the GM-cell hypothesis. We show that, contrary to the general impression, a GM-cell representation can be information-theoretically efficient, but that it must be accompanied by cells giving a distributed coding of the input. We present a general method to deduce the sparsity distribution of the whole neuronal population from a sample, and use it to show there are two populations of cells: a distributed-code population of less than about 5% of the cells, and a much more sparsely responding population of putative GM cells. With an allowance for the number of undetected silent cells, we find that the putative GM cells can code for 10^5 or more categories, sufficient for them to be classic GM cells, or to be GM-like cells coding for memories. We quantify the strong biases against detection of GM cells, and show consistency of our results with previous measurements that find only distributed coding. We discuss the consequences for the architecture of neural systems and synapt
Multiplexed immuno-fluorescence tissue imaging, allowing simultaneous detection of molecular properties of cells, is an essential tool for characterizing the complex cellular mechanisms in translational research and clinical practice. New image analysis approaches are needed because tissue section stained with a mixture of protein, DNA and RNA biomarkers are introducing various complexities, including spurious edges due to fluorescent staining artifacts between touching or overlapping cells. We have developed the RRScell method harnessing the stochastic random-reaction-seed (RRS) algorithm and deep neural learning U-net to extract single-cell resolution profiling-map of gene expression over a million cells tissue section accurately and automatically. Furthermore, with the use of manifold learning technique UMAP for cell phenotype cluster analysis, the AI-driven RRScell has equipped with a marker-based image cytometry analysis tool (markerUMAP) in quantifying spatial distribution of cell phenotypes from tissue images with a mixture of biomarkers. The results achieved in this study suggest that RRScell provides a robust enough way for extracting cytometric single cell morphology as w
Organisms across all domains of life regulate the size of their cells. However, the means by which this is done is poorly understood. We study two abstracted "molecular" models for size regulation: inhibitor dilution and initiator accumulation. We apply the models to two settings: bacteria like Escherichia coli, that grow fully before they set a division plane and divide into two equally sized cells, and cells that form a bud early in the cell division cycle, confine new growth to that bud, and divide at the connection between that bud and the mother cell, like the budding yeast Saccharomyces cerevisiae. In budding cells, delaying cell division until buds reach the same size as their mother leads to very weak size control, with average cell size and standard deviation of cell size increasing over time and saturating up to 100-fold higher than those values for cells that divide when the bud is still substantially smaller than its mother. In budding yeast, both inhibitor dilution or initiator accumulation models are consistent with the observation that the daughters of diploid cells add a constant volume before they divide. This adder behavior has also been observed in bacteria. We f
Cell segmentation is a critical step for quantitative single-cell analysis in microscopy images. Existing cell segmentation methods are often tailored to specific modalities or require manual interventions to specify hyper-parameters in different experimental settings. Here, we present a multi-modality cell segmentation benchmark, comprising over 1500 labeled images derived from more than 50 diverse biological experiments. The top participants developed a Transformer-based deep-learning algorithm that not only exceeds existing methods but can also be applied to diverse microscopy images across imaging platforms and tissue types without manual parameter adjustments. This benchmark and the improved algorithm offer promising avenues for more accurate and versatile cell analysis in microscopy imaging.