共找到 20 条结果
Machine-learning (ML) models in flow cytometry have the potential to reduce error rates, increase reproducibility, and boost the efficiency of clinical labs. While numerous ML models for flow cytometry data have been proposed, few studies have described the clinical deployment of such models. Realizing the potential gains of ML models in clinical labs requires not only an accurate model, but infrastructure for automated inference, error detection, analytics and monitoring, and structured data extraction. Here, we describe an ML model for detection of Acute Myeloid Leukemia (AML), along with the infrastructure supporting clinical implementation. Our infrastructure leverages the resilience and scalability of the cloud for model inference, a Kubernetes-based workflow system that provides model reproducibility and resource management, and a system for extracting structured diagnoses from full-text reports. We also describe our model monitoring and visualization platform, an essential element for ensuring continued model accuracy. Finally, we present a post-deployment analysis of impacts on turn-around time and compare production accuracy to the original validation statistics.
Rapid identification of microparticles in liquid is an important problem in environmental and biomedical applications such as for microplastic detection in water sources and physiological fluids. Existing spectro-scopic techniques are usually slow and not compatible with flow-through systems. Here we analyze single microparticles in the 14 - 20 micrometer range using a combination of two electronic sensors in the same microfluidic system: a microwave capacitive sensor and a resistive pulse sensor. Together, this integrated sen-sor system yields the effective electrical permittivity of the analyte particles. To simplify data analysis, 3D electrode arrangements were used instead of planar electrodes, so that the generated signal is unaffected by the height of the particle in the microfluidic channel. With this platform, we were able to distinguish between polystyrene (PS) and polyethylene (PE) microparticles. We showcase the sensitivity and speed of this tech-nique and discuss the implications for the future application of microwave cytometry technology in the en-vironmental and biomedical fields.
Microplastics are increasingly recognized as a global environmental health threat, yet their detection and characterization remain constrained by the cost, form factor, and throughput of existing analytical tools. Portable micro/nanotechnology-based sensors are emerging to address this need, but most rely on the assumption of spherical particle geometry in their operating principle, limiting their relevance for environmental analysis. Here, we overcome this limitation by advancing microwave cytometry with machine learning-enabled shape recognition. Microwave cytometry is a flow-through electronic platform that integrates microwave resonator responses with low-frequency impedance signals to capture the dielectric signatures of individual particles. Using microscopy-derived shape measurements as ground truth, we trained a random forest model to decode these information-rich waveforms. Once trained, the system operates without optical input, enabling electronic-only determination of particle geometry. We demonstrate extraction of the major and minor axes of ellipsoidal microparticles with <8% relative error on average and use these predictions to derive the dielectric permittivity
Cytometry experiments yield high-dimensional point cloud data that is difficult to interpret manually. Boolean gating techniques coupled with comparisons of relative abundances of cellular subsets is the current standard for cytometry data analysis. However, this approach is unable to capture more subtle topological features hidden in data, especially if those features are further masked by data transforms or significant batch effects or donor-to-donor variations in clinical data. Analysis of publicly available cytometry data describing non-naïve CD8+ T cells in COVID-19 patients and healthy controls shows that systematic structural differences exist between single cell protein expressions in COVID-19 patients and healthy controls. We identify proteins of interest by a decision-tree based classifier, sample points randomly and compute persistence diagrams from these sampled points. The resulting persistence diagrams identify regions in cytometry datasets of varying density and identify protruded structures such as `elbows'. We compute Wasserstein distances between these persistence diagrams for random pairs of healthy controls and COVID-19 patients and find that systematic structur
Imaging flow cytometry (IFC) enables high-throughput single-cell analysis but largely relies on fluorescence labeling to obtain molecular specificity. Label-free vibrational imaging can provide intrinsic chemical contrast, yet coherent Raman-based methods interrogate only a limited axial volume, which restricts quantitative whole-cell analysis under flow. Mid-infrared photothermal (MIP) microscopy offers a promising route to overcome this limitation by combining linear mid-infrared (MIR) absorption-based chemical contrast with visible-light detection, allowing chemical imaging of a broader axial volume of each cell in a wide-field configuration. However, applying MIP microscopy to rapidly flowing cells has been difficult because conventional frame-sequential acquisition of MIR-ON and MIR-OFF images is highly susceptible to motion-induced subtraction artifacts. Here we demonstrate MIP-IFC, a label-free imaging flow cytometry platform based on single-shot nanosecond-dual-pulse MIP (SNAP-MIP) microscopy. SNAP-MIP encodes the MIR-ON and MIR-OFF states into separate holographic channels within a single camera exposure, reducing their temporal separation to 20 ns. This single-shot acquis
Ocean microbes are critical to both ocean ecosystems and the global climate. Flow cytometry, which measures cell optical properties in fluid samples, is routinely used in oceanographic research. Despite decades of accumulated data, identifying key microbial populations (a process known as ``gating'') remains a significant analytical challenge. To address this, we focus on gating multidimensional, high-frequency flow cytometry data collected {\it continuously} on board oceanographic research vessels, capturing time- and space-wise variations in the dynamic ocean. Our paper proposes a novel mixture-of-experts model in which both the gating function and the experts are given by trend filtering. The model leverages two key assumptions: (1) Each snapshot of flow cytometry data is a mixture of multivariate Gaussians and (2) the parameters of these Gaussians vary smoothly over time. Our method uses regularization and a constraint to ensure smoothness and that cluster means match biologically distinct microbe types. We demonstrate, using flow cytometry data from the North Pacific Ocean, that our proposed model accurately matches human-annotated gating and corrects significant errors.
The ocean is filled with phytoplankton that contribute as much photosynthesis as all land plants combined, making them vital to the carbon cycle and climate system. Recent advances in flow cytometry allow oceanographers to measure the optical traits of individual cells along research cruise tracks, generating single-cell resolution microbial data. In marine microbial ecology, detecting locations of abrupt changes in the environmental response of cytometric plankton distributions is an important task. This manuscript proposes a latent space Gaussian mixture-of-experts model, facilitating change point detection in replicated and clustered phytoplankton observations. Change points are identified through shifts in prior means of low-dimensional representations, with piecewise-constant structure enforced by a group-fused LASSO penalty. The optimization problem is then solved via Alternating Direction Method of Multipliers. Applied to flow cytometry data, the proposed method identifies a scientifically important change point that aligns with a transition zone between two marine provinces.
We present ImmuVis, a family of efficient foundation models for imaging mass cytometry (IMC), a high-throughput multiplex imaging technology that handles molecular marker measurements as image channels and enables large-scale spatial tissue profiling. Unlike natural images, multiplex imaging lacks a fixed channel space, as real-world marker sets vary across studies, violating a core assumption of standard vision backbones. To address this, ImmuVis introduces marker-adaptive hyperconvolutions that generate convolutional kernels from learned marker embeddings, enabling a single model to operate on arbitrary measured marker subsets without retraining. We pretrain ImmuVis on the largest dataset to date, IMC17M (28 cohorts, 24,405 images, 265 markers, over 17M patches), using self-supervised masked reconstruction. ImmuVis outperforms state-of-the-art baselines and ablations in virtual staining and downstream classification tasks at substantially lower compute cost than transformer-based alternatives, and is the sole model that provides calibrated uncertainty via a heteroscedastic likelihood objective. These results position ImmuVis as a practical framework for real-world IMC modeling.
This work presents an optical neuromorphic imaging and processing cytometry system that integrates an excitable VCSEL-based time-delayed (TD) extreme learning machine with an event-based 2D camera. The proposed system is designed for the classification of Polymethyl Methacrylate (PMMA) particles of varying diameters moving at speeds between 0.01 and 0.1 m/s. The TD photonic scheme achieved a classification accuracy of 95.8% while encoding the original 2D images into a 1-bit spike stream containing a maximum of 96 spikes. Additionally, the binary representation of the synthetic frames enables a significant reduction in memory and hardware requirements, ranging from 98.4% to 99.5% and 50% to 84%, respectively. These findings demonstrate that the integration of neuromorphic computing with sensing can facilitate the development of low-power, low-latency applications optimized for resource-constrained environments
Specialised pre-trained language models are becoming more frequent in NLP since they can potentially outperform models trained on generic texts. BioBERT and BioClinicalBERT are two examples of such models that have shown promise in medical NLP tasks. Many of these models are overparametrised and resource-intensive, but thanks to techniques like Knowledge Distillation (KD), it is possible to create smaller versions that perform almost as well as their larger counterparts. In this work, we specifically focus on development of compact language models for processing clinical texts (i.e. progress notes, discharge summaries etc). We developed a number of efficient lightweight clinical transformers using knowledge distillation and continual learning, with the number of parameters ranging from 15 million to 65 million. These models performed comparably to larger models such as BioBERT and ClinicalBioBERT and significantly outperformed other compact models trained on general or biomedical data. Our extensive evaluation was done across several standard datasets and covered a wide range of clinical text-mining tasks, including Natural Language Inference, Relation Extraction, Named Entity Reco
Flow cytometry is a valuable technique that measures the optical properties of particles at a single-cell resolution. When deployed in the ocean, flow cytometry allows oceanographers to study different types of photosynthetic microbes called phytoplankton. It is of great interest to study how phytoplankton properties change in response to environmental conditions. In our work, we develop a nonlinear mixture of experts model to estimate separate regression functions for each subpopulation utilizing random-weight neural networks. Our model allows one to flexibly estimate how cell properties and relative abundances depend on environmental covariates in each segment of a heterogeneous sample, without the computational burden of backpropagation. We show that the proposed model provides superior predictive performance in simulated examples compared to a mixture of linear experts. Also, applying our model to real data, we show that our model has (1) comparable out-of-sample prediction performance, and (2) more realistic estimates of phytoplankton behavior.
The contamination detection problem aims to determine whether a set of observations has been contaminated, i.e. whether it contains points drawn from a distribution different from the reference distribution. Here, we consider a supervised problem, where labeled samples drawn from both the reference distribution and the contamination distribution are available at training time. This problem is motivated by the detection of rare cells in flow cytometry. Compared to novelty detection problems or two-sample testing, where only samples from the reference distribution are available, the challenge lies in efficiently leveraging the observations from the contamination detection to design more powerful tests. In this article, we introduce a test for the supervised contamination detection problem. We provide non-asymptotic guarantees on its Type I error, and characterize its detection rate. The test relies on estimating reference and contamination densities using histograms, and its power depends strongly on the choice of the corresponding partition. We present an algorithm for judiciously choosing the partition that results in a powerful test. Simulations illustrate the good empirical perfo
Image mass cytometry (IMC) enables high-dimensional spatial profiling by combining mass cytometry's analytical power with spatial distributions of cell phenotypes. Recent studies leverage large language models (LLMs) to extract cell states by translating gene or protein expression into biological context. However, existing single-cell LLMs face two major challenges: (1) Integration of spatial information: they struggle to generalize spatial coordinates and effectively encode spatial context as text, and (2) Treating each cell independently: they overlook cell-cell interactions, limiting their ability to capture biological relationships. To address these limitations, we propose Spatial2Sentence, a novel framework that integrates single-cell expression and spatial information into natural language using a multi-sentence approach. Spatial2Sentence constructs expression similarity and distance matrices, pairing spatially adjacent and expressionally similar cells as positive pairs while using distant and dissimilar cells as negatives. These multi-sentence representations enable LLMs to learn cellular interactions in both expression and spatial contexts. Equipped with multi-task learning
Circulating blood cell clusters (CCCs) containing red blood cells (RBCs), white blood cells(WBCs), and platelets are significant biomarkers linked to conditions like thrombosis, infection, and inflammation. Flow cytometry, paired with fluorescence staining, is commonly used to analyze these cell clusters, revealing cell morphology and protein profiles. While computational approaches based on machine learning have advanced the automatic analysis of single-cell flow cytometry images, there is a lack of effort to build tools to automatically analyze images containing CCCs. Unlike single cells, cell clusters often exhibit irregular shapes and sizes. In addition, these cell clusters often consist of heterogeneous cell types, which require multi-channel staining to identify the specific cell types within the clusters. This study introduces a new computational framework for analyzing CCC images and identifying cell types within clusters. Our framework uses a two-step analysis strategy. First, it categorizes images into cell cluster and non-cluster groups by fine-tuning the You Only Look Once(YOLOv11) model, which outperforms traditional convolutional neural networks (CNNs), Vision Transfo
Flow cytometry is widely used to identify cell populations in patient-derived fluids such as peripheral blood (PB) or cerebrospinal fluid (CSF). While ubiquitous in research and clinical practice, flow cytometry requires gating, i.e. cell type identification which requires labor-intensive and error-prone manual adjustments. To facilitate this process, we designed GateNet, the first neural network architecture enabling full end-to-end automated gating without the need to correct for batch effects. We train GateNet with over 8,000,000 events based on N=127 PB and CSF samples which were manually labeled independently by four experts. We show that for novel, unseen samples, GateNet achieves human-level performance (F1 score ranging from 0.910 to 0.997). In addition we apply GateNet to a publicly available dataset confirming generalization with an F1 score of 0.936. As our implementation utilizes graphics processing units (GPU), gating only needs 15 microseconds per event. Importantly, we also show that GateNet only requires ~10 samples to reach human-level performance, rendering it widely applicable in all domains of flow cytometry.
This paper presents FlowCyt, the first comprehensive benchmark for multi-class single-cell classification in flow cytometry data. The dataset comprises bone marrow samples from 30 patients, with each cell characterized by twelve markers. Ground truth labels identify five hematological cell types: T lymphocytes, B lymphocytes, Monocytes, Mast cells, and Hematopoietic Stem/Progenitor Cells (HSPCs). Experiments utilize supervised inductive learning and semi-supervised transductive learning on up to 1 million cells per patient. Baseline methods include Gaussian Mixture Models, XGBoost, Random Forests, Deep Neural Networks, and Graph Neural Networks (GNNs). GNNs demonstrate superior performance by exploiting spatial relationships in graph-encoded data. The benchmark allows standardized evaluation of clinically relevant classification tasks, along with exploratory analyses to gain insights into hematological cell phenotypes. This represents the first public flow cytometry benchmark with a richly annotated, heterogeneous dataset. It will empower the development and rigorous assessment of novel methodologies for single-cell analysis.
In cytometry, it is difficult to disentangle the contributions of population variance and instrument noise towards total measured variation. Fundamentally, this is due to the fact that one cannot measure the same particle multiple times. We propose a simple experiment that uses a cell sorter to distinguish instrument-specific variation. For a population of beads whose intensities are distributed around a single peak, the sorter is used to collect beads whose measured intensities lie below some threshold. This subset of particles is then remeasured. If the variation in the measured values is only due to the sample, the second set of measurements should also lie entirely below our threshold. Any 'spillover' is therefore due to instrument specific effects - we demonstrate how the distribution of the post-sort measurements is sufficient to extract an estimate of the cumulative variability induced by the instrument. A distinguishing feature of our work is that we do not make any assumptions about the sources of said noise. We then show how 'local affine transformations' let us transfer these estimates to cytometers not equipped with a sorter. We use our analysis to estimate noise for a
Flow cytometry mainly used for detecting the characteristics of a number of biochemical substances based on the expression of specific markers in cells. It is particularly useful for detecting membrane surface receptors, antigens, ions, or during DNA/RNA expression. Not only can it be employed as a biomedical research tool for recognising distinctive types of cells in mixed populations, but it can also be used as a diagnostic tool for classifying abnormal cell populations connected with disease. Modern flow cytometers can rapidly analyse tens of thousands of cells at the same time while also measuring multiple parameters from a single cell. However, the rapid development of flow cytometers makes it challenging for conventional analysis methods to interpret flow cytometry data. Researchers need to be able to distinguish interesting-looking cell populations manually in multi-dimensional data collected from millions of cells. Thus, it is essential to find a robust approach for analysing flow cytometry data automatically, specifically in identifying cell populations automatically. This thesis mainly concerns discover the potential shortcoming of current automated-gating algorithms in b
Flow cytometry is a cornerstone technique in medical and biological research, providing crucial information about cell size and granularity through forward scatter (FSC) and side scatter (SSC) signals. Despite its widespread use, the precise relationship between these scatter signals and corresponding microscopic images remains underexplored. Here, we investigate this intrinsic relationship by utilizing scattering theory and holotomography, a three-dimensional quantitative phase imaging (QPI) technique. We demonstrate the extraction of FSC and SSC signals from individual, unlabeled cells by analyzing their three-dimensional refractive index distributions obtained through holotomography. Additionally, we introduce a method for digitally windowing SSC signals to facilitate effective segmentation and morphology-based cell type classification. Our approach bridges the gap between flow cytometry and microscopic imaging, offering a new perspective on analyzing cellular characteristics with high accuracy and without the need for labeling.
Advancements in cytometry technologies have led to a remarkable increase in the number of markers that can be analyzed simultaneously, presenting significant challenges in data analysis. Traditional approaches, such as dimensional reduction techniques and computational clustering, although popular, often face reproducibility challenges due to their heavy reliance on inherent data structures, preventing direct translation of their outputs into gating strategies to be used in downstream experiments. Here we propose the novel Gating Tree methodology, a pathfinding approach that investigates the multidimensional data landscape to unravel group-specific features without the use of dimensional reduction. This method employs novel measures, including enrichment scores and gating entropy, to effectively identify group-specific features within high-dimensional cytometric datasets. Our analysis, applied to both simulated and real cytometric datasets, demonstrates that the Gating Tree not only identifies group-specific features comprehensively but also produces outputs that are immediately usable as gating strategies for unequivocally identifying cell populations. In conclusion, the Gating Tr