共找到 20 条结果
Ocean microbes are critical to both ocean ecosystems and the global climate. Flow cytometry, which measures cell optical properties in fluid samples, is routinely used in oceanographic research. Despite decades of accumulated data, identifying key microbial populations (a process known as ``gating'') remains a significant analytical challenge. To address this, we focus on gating multidimensional, high-frequency flow cytometry data collected {\it continuously} on board oceanographic research vessels, capturing time- and space-wise variations in the dynamic ocean. Our paper proposes a novel mixture-of-experts model in which both the gating function and the experts are given by trend filtering. The model leverages two key assumptions: (1) Each snapshot of flow cytometry data is a mixture of multivariate Gaussians and (2) the parameters of these Gaussians vary smoothly over time. Our method uses regularization and a constraint to ensure smoothness and that cluster means match biologically distinct microbe types. We demonstrate, using flow cytometry data from the North Pacific Ocean, that our proposed model accurately matches human-annotated gating and corrects significant errors.
The ocean is filled with phytoplankton that contribute as much photosynthesis as all land plants combined, making them vital to the carbon cycle and climate system. Recent advances in flow cytometry allow oceanographers to measure the optical traits of individual cells along research cruise tracks, generating single-cell resolution microbial data. In marine microbial ecology, detecting locations of abrupt changes in the environmental response of cytometric plankton distributions is an important task. This manuscript proposes a latent space Gaussian mixture-of-experts model, facilitating change point detection in replicated and clustered phytoplankton observations. Change points are identified through shifts in prior means of low-dimensional representations, with piecewise-constant structure enforced by a group-fused LASSO penalty. The optimization problem is then solved via Alternating Direction Method of Multipliers. Applied to flow cytometry data, the proposed method identifies a scientifically important change point that aligns with a transition zone between two marine provinces.
Imaging flow cytometry (IFC) enables high-throughput single-cell analysis but largely relies on fluorescence labeling to obtain molecular specificity. Label-free vibrational imaging can provide intrinsic chemical contrast, yet coherent Raman-based methods interrogate only a limited axial volume, which restricts quantitative whole-cell analysis under flow. Mid-infrared photothermal (MIP) microscopy offers a promising route to overcome this limitation by combining linear mid-infrared (MIR) absorption-based chemical contrast with visible-light detection, allowing chemical imaging of a broader axial volume of each cell in a wide-field configuration. However, applying MIP microscopy to rapidly flowing cells has been difficult because conventional frame-sequential acquisition of MIR-ON and MIR-OFF images is highly susceptible to motion-induced subtraction artifacts. Here we demonstrate MIP-IFC, a label-free imaging flow cytometry platform based on single-shot nanosecond-dual-pulse MIP (SNAP-MIP) microscopy. SNAP-MIP encodes the MIR-ON and MIR-OFF states into separate holographic channels within a single camera exposure, reducing their temporal separation to 20 ns. This single-shot acquis
We present ImmuVis, a family of efficient foundation models for imaging mass cytometry (IMC), a high-throughput multiplex imaging technology that handles molecular marker measurements as image channels and enables large-scale spatial tissue profiling. Unlike natural images, multiplex imaging lacks a fixed channel space, as real-world marker sets vary across studies, violating a core assumption of standard vision backbones. To address this, ImmuVis introduces marker-adaptive hyperconvolutions that generate convolutional kernels from learned marker embeddings, enabling a single model to operate on arbitrary measured marker subsets without retraining. We pretrain ImmuVis on the largest dataset to date, IMC17M (28 cohorts, 24,405 images, 265 markers, over 17M patches), using self-supervised masked reconstruction. ImmuVis outperforms state-of-the-art baselines and ablations in virtual staining and downstream classification tasks at substantially lower compute cost than transformer-based alternatives, and is the sole model that provides calibrated uncertainty via a heteroscedastic likelihood objective. These results position ImmuVis as a practical framework for real-world IMC modeling.
Microplastics are increasingly recognized as a global environmental health threat, yet their detection and characterization remain constrained by the cost, form factor, and throughput of existing analytical tools. Portable micro/nanotechnology-based sensors are emerging to address this need, but most rely on the assumption of spherical particle geometry in their operating principle, limiting their relevance for environmental analysis. Here, we overcome this limitation by advancing microwave cytometry with machine learning-enabled shape recognition. Microwave cytometry is a flow-through electronic platform that integrates microwave resonator responses with low-frequency impedance signals to capture the dielectric signatures of individual particles. Using microscopy-derived shape measurements as ground truth, we trained a random forest model to decode these information-rich waveforms. Once trained, the system operates without optical input, enabling electronic-only determination of particle geometry. We demonstrate extraction of the major and minor axes of ellipsoidal microparticles with <8% relative error on average and use these predictions to derive the dielectric permittivity
Image mass cytometry (IMC) enables high-dimensional spatial profiling by combining mass cytometry's analytical power with spatial distributions of cell phenotypes. Recent studies leverage large language models (LLMs) to extract cell states by translating gene or protein expression into biological context. However, existing single-cell LLMs face two major challenges: (1) Integration of spatial information: they struggle to generalize spatial coordinates and effectively encode spatial context as text, and (2) Treating each cell independently: they overlook cell-cell interactions, limiting their ability to capture biological relationships. To address these limitations, we propose Spatial2Sentence, a novel framework that integrates single-cell expression and spatial information into natural language using a multi-sentence approach. Spatial2Sentence constructs expression similarity and distance matrices, pairing spatially adjacent and expressionally similar cells as positive pairs while using distant and dissimilar cells as negatives. These multi-sentence representations enable LLMs to learn cellular interactions in both expression and spatial contexts. Equipped with multi-task learning
Circulating blood cell clusters (CCCs) containing red blood cells (RBCs), white blood cells(WBCs), and platelets are significant biomarkers linked to conditions like thrombosis, infection, and inflammation. Flow cytometry, paired with fluorescence staining, is commonly used to analyze these cell clusters, revealing cell morphology and protein profiles. While computational approaches based on machine learning have advanced the automatic analysis of single-cell flow cytometry images, there is a lack of effort to build tools to automatically analyze images containing CCCs. Unlike single cells, cell clusters often exhibit irregular shapes and sizes. In addition, these cell clusters often consist of heterogeneous cell types, which require multi-channel staining to identify the specific cell types within the clusters. This study introduces a new computational framework for analyzing CCC images and identifying cell types within clusters. Our framework uses a two-step analysis strategy. First, it categorizes images into cell cluster and non-cluster groups by fine-tuning the You Only Look Once(YOLOv11) model, which outperforms traditional convolutional neural networks (CNNs), Vision Transfo
Flow cytometry is a valuable technique that measures the optical properties of particles at a single-cell resolution. When deployed in the ocean, flow cytometry allows oceanographers to study different types of photosynthetic microbes called phytoplankton. It is of great interest to study how phytoplankton properties change in response to environmental conditions. In our work, we develop a nonlinear mixture of experts model to estimate separate regression functions for each subpopulation utilizing random-weight neural networks. Our model allows one to flexibly estimate how cell properties and relative abundances depend on environmental covariates in each segment of a heterogeneous sample, without the computational burden of backpropagation. We show that the proposed model provides superior predictive performance in simulated examples compared to a mixture of linear experts. Also, applying our model to real data, we show that our model has (1) comparable out-of-sample prediction performance, and (2) more realistic estimates of phytoplankton behavior.
Phytoplankton are microscopic algae responsible for roughly half of the world's photosynthesis that play a critical role in global carbon cycles and oxygen production, and measuring the abundance of their subtypes across a wide range of spatiotemporal scales is of great relevance to oceanography. High-frequency flow cytometry is a powerful technique in which oceanographers at sea can rapidly record the optical properties of tens of thousands of individual phytoplankton cells every few minutes. Identifying distinct subpopulations within these vast datasets (a process known as "gating") remains a major challenge and has largely been performed manually so far. In this paper, we introduce a fast, automated gating method, which accurately identifies phytoplankton populations by fitting a time-evolving mixture of Gaussians model using an expectation-maximization-like algorithm with kernel smoothing. We use simulated data to demonstrate the validity and robustness of this approach, and use oceanographic cruise data to highlight the method's ability to not only replicate but surpass expert manual gating. Finally, we provide the flowkernel R package, written in literate programming, that im
This paper presents FlowCyt, the first comprehensive benchmark for multi-class single-cell classification in flow cytometry data. The dataset comprises bone marrow samples from 30 patients, with each cell characterized by twelve markers. Ground truth labels identify five hematological cell types: T lymphocytes, B lymphocytes, Monocytes, Mast cells, and Hematopoietic Stem/Progenitor Cells (HSPCs). Experiments utilize supervised inductive learning and semi-supervised transductive learning on up to 1 million cells per patient. Baseline methods include Gaussian Mixture Models, XGBoost, Random Forests, Deep Neural Networks, and Graph Neural Networks (GNNs). GNNs demonstrate superior performance by exploiting spatial relationships in graph-encoded data. The benchmark allows standardized evaluation of clinically relevant classification tasks, along with exploratory analyses to gain insights into hematological cell phenotypes. This represents the first public flow cytometry benchmark with a richly annotated, heterogeneous dataset. It will empower the development and rigorous assessment of novel methodologies for single-cell analysis.
Representing and quantifying Minimal Residual Disease (MRD) in Acute Myeloid Leukemia (AML), a type of cancer that affects the blood and bone marrow, is essential in the prognosis and follow-up of AML patients. As traditional cytological analysis cannot detect leukemia cells below 5\%, the analysis of flow cytometry dataset is expected to provide more reliable results. In this paper, we explore statistical learning methods based on optimal transport (OT) to achieve a relevant low-dimensional representation of multi-patient flow cytometry measurements (FCM) datasets considered as high-dimensional probability distributions. Using the framework of OT, we justify the use of the K-means algorithm for dimensionality reduction of multiple large-scale point clouds through mean measure quantization by merging all the data into a single point cloud. After this quantization step, the visualization of the intra and inter-patients FCM variability is carried out by embedding low-dimensional quantized probability measures into a linear space using either Wasserstein Principal Component Analysis (PCA) through linearized OT or log-ratio PCA of compositional data. Using a publicly available FCM data
Machine-learning (ML) models in flow cytometry have the potential to reduce error rates, increase reproducibility, and boost the efficiency of clinical labs. While numerous ML models for flow cytometry data have been proposed, few studies have described the clinical deployment of such models. Realizing the potential gains of ML models in clinical labs requires not only an accurate model, but infrastructure for automated inference, error detection, analytics and monitoring, and structured data extraction. Here, we describe an ML model for detection of Acute Myeloid Leukemia (AML), along with the infrastructure supporting clinical implementation. Our infrastructure leverages the resilience and scalability of the cloud for model inference, a Kubernetes-based workflow system that provides model reproducibility and resource management, and a system for extracting structured diagnoses from full-text reports. We also describe our model monitoring and visualization platform, an essential element for ensuring continued model accuracy. Finally, we present a post-deployment analysis of impacts on turn-around time and compare production accuracy to the original validation statistics.
Flow cytometry is a cornerstone technique in medical and biological research, providing crucial information about cell size and granularity through forward scatter (FSC) and side scatter (SSC) signals. Despite its widespread use, the precise relationship between these scatter signals and corresponding microscopic images remains underexplored. Here, we investigate this intrinsic relationship by utilizing scattering theory and holotomography, a three-dimensional quantitative phase imaging (QPI) technique. We demonstrate the extraction of FSC and SSC signals from individual, unlabeled cells by analyzing their three-dimensional refractive index distributions obtained through holotomography. Additionally, we introduce a method for digitally windowing SSC signals to facilitate effective segmentation and morphology-based cell type classification. Our approach bridges the gap between flow cytometry and microscopic imaging, offering a new perspective on analyzing cellular characteristics with high accuracy and without the need for labeling.
This paper evaluates various deep learning methods for measurable residual disease (MRD) detection in flow cytometry (FCM) data, addressing questions regarding the benefits of modeling long-range dependencies, methods of obtaining global information, and the importance of learning local features. Based on our findings, we propose two adaptations to the current state-of-the-art (SOTA) model. Our contributions include an enhanced SOTA model, demonstrating superior performance on publicly available datasets and improved generalization across laboratories, as well as valuable insights for the FCM community, guiding future DL architecture designs for FCM data analysis. The code is available at \url{https://github.com/lisaweijler/flowNetworks}.
Flow cytometry is a powerful quantitative assay supporting high-throughput collection of single-cell data with a high dynamic range. For flow cytometry to yield reproducible data with a quantitative relationship to the underlying biology, however, requires that 1) appropriate process controls are collected along with experimental samples, 2) these process controls are used for unit calibration and quality control, and 3) data is analyzed using appropriate statistics. To this end, this article describes methods for quantitative flow cytometry through addition of process controls and analyses, thereby enabling better development, modeling, and debugging of engineered biological organisms. The methods described here have specifically been developed in the context of transient transfections in mammalian cells, but may in many cases be adaptable to other categories of transfection and other types of cells.
Flow cytometry is widely used to identify cell populations in patient-derived fluids such as peripheral blood (PB) or cerebrospinal fluid (CSF). While ubiquitous in research and clinical practice, flow cytometry requires gating, i.e. cell type identification which requires labor-intensive and error-prone manual adjustments. To facilitate this process, we designed GateNet, the first neural network architecture enabling full end-to-end automated gating without the need to correct for batch effects. We train GateNet with over 8,000,000 events based on N=127 PB and CSF samples which were manually labeled independently by four experts. We show that for novel, unseen samples, GateNet achieves human-level performance (F1 score ranging from 0.910 to 0.997). In addition we apply GateNet to a publicly available dataset confirming generalization with an F1 score of 0.936. As our implementation utilizes graphics processing units (GPU), gating only needs 15 microseconds per event. Importantly, we also show that GateNet only requires ~10 samples to reach human-level performance, rendering it widely applicable in all domains of flow cytometry.
This work presents an optical neuromorphic imaging and processing cytometry system that integrates an excitable VCSEL-based time-delayed (TD) extreme learning machine with an event-based 2D camera. The proposed system is designed for the classification of Polymethyl Methacrylate (PMMA) particles of varying diameters moving at speeds between 0.01 and 0.1 m/s. The TD photonic scheme achieved a classification accuracy of 95.8% while encoding the original 2D images into a 1-bit spike stream containing a maximum of 96 spikes. Additionally, the binary representation of the synthetic frames enables a significant reduction in memory and hardware requirements, ranging from 98.4% to 99.5% and 50% to 84%, respectively. These findings demonstrate that the integration of neuromorphic computing with sensing can facilitate the development of low-power, low-latency applications optimized for resource-constrained environments
Flow cytometry mainly used for detecting the characteristics of a number of biochemical substances based on the expression of specific markers in cells. It is particularly useful for detecting membrane surface receptors, antigens, ions, or during DNA/RNA expression. Not only can it be employed as a biomedical research tool for recognising distinctive types of cells in mixed populations, but it can also be used as a diagnostic tool for classifying abnormal cell populations connected with disease. Modern flow cytometers can rapidly analyse tens of thousands of cells at the same time while also measuring multiple parameters from a single cell. However, the rapid development of flow cytometers makes it challenging for conventional analysis methods to interpret flow cytometry data. Researchers need to be able to distinguish interesting-looking cell populations manually in multi-dimensional data collected from millions of cells. Thus, it is essential to find a robust approach for analysing flow cytometry data automatically, specifically in identifying cell populations automatically. This thesis mainly concerns discover the potential shortcoming of current automated-gating algorithms in b
In the complex landscape of hematologic samples such as peripheral blood or bone marrow derived from flow cytometry (FC) data, cell-level prediction presents profound challenges. This work explores injecting hierarchical prior knowledge into graph neural networks (GNNs) for single-cell multi-class classification of tabular cellular data. By representing the data as graphs and encoding hierarchical relationships between classes, we propose our hierarchical plug-in method to be applied to several GNN models, namely, FCHC-GNN, and effectively designed to capture neighborhood information crucial for single-cell FC domain. Extensive experiments on our cohort of 19 distinct patients, demonstrate that incorporating hierarchical biological constraints boosts performance significantly across multiple metrics compared to baseline GNNs without such priors. The proposed approach highlights the importance of structured inductive biases for gaining improved generalization in complex biological prediction tasks.
In this work, we present experimental results of a high-speed label-free imaging cytometry system that seamlessly merges the high-capturing rate and data sparsity of an event-based CMOS camera with lightweight photonic neuromorphic processing. This combination offers high classification accuracy and a massive reduction in the number of trainable parameters of the digital machine-learning back-end. The photonic neuromorphic accelerator is based on a hardware-friendly passive optical spectrum slicing technique that is able to extract meaningful features from the generated spike-trains. The experimental scenario comprises the discrimination of artificial polymethyl methacrylate calibrated beads, having different diameters, flowing at a mean speed of 0.01m/sec. Classification accuracy, using only lightweight, digital machine-learning schemes has topped at 98.2%. On the other hand, by experimentally pre-processing the raw spike data through the proposed photonic neuromorphic spectrum slicer we achieved an accuracy of 98.6%. This performance was accompanied by a reduction in the number of trainable parameters at the classification back-end by a factor ranging from 8 to 22, depending on t