{"title":"多病人流式细胞术数据集的低维表示,使用最佳传输用于可测量的白血病残留疾病检测。","authors":"Erell Gachon, Jérémie Bigot, Elsa Cazelles, Audrey Bidet, Jean-Philippe Vial, Pierre-Yves Dumas, Aguirre Mimoun","doi":"10.1002/cyto.a.24918","DOIUrl":null,"url":null,"abstract":"<p>Representing and quantifying Measurable Residual Disease (MRD) in Acute Myeloid Leukemia (AML), a type of cancer that affects the blood and bone marrow, is essential in the prognosis and follow-up of AML patients. As traditional cytological analysis cannot detect leukemia cells below 5%, the analysis of flow cytometry datasets is expected to provide more reliable results. In this paper, we explore statistical learning methods based on optimal transport (OT) to achieve a relevant low-dimensional representation of multi-patient flow cytometry measurements (FCM) datasets considered as high-dimensional probability distributions. Using the framework of OT, we justify the use of the <i>K</i>-means algorithm for dimensionality reduction of multiple large-scale point clouds through mean measure quantization by merging all the data into a single point cloud. After this quantization step, the visualization of the intra-and inter-patient FCM variability is carried out by embedding low-dimensional quantized probability measures into a linear space using either Wasserstein Principal Component Analysis (PCA) through linearized OT or log-ratio PCA of compositional data. Using a publicly available FCM dataset and a FCM dataset from Bordeaux University Hospital, we demonstrate the benefits of our approach over the popular kernel mean embedding technique for statistical learning from multiple high-dimensional probability distributions. We also highlight the usefulness of our methodology for low-dimensional projection and clustering patient measurements according to their level of MRD in AML from FCM. In particular, our OT-based approach allows a relevant and informative two-dimensional representation of the results of the FlowSom algorithm, a state-of-the-art method for the detection of MRD in AML using multi-patient FCM.</p>","PeriodicalId":11068,"journal":{"name":"Cytometry Part A","volume":"107 2","pages":"126-139"},"PeriodicalIF":2.5000,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cyto.a.24918","citationCount":"0","resultStr":"{\"title\":\"Low Dimensional Representation of Multi-Patient Flow Cytometry Datasets Using Optimal Transport for Measurable Residual Disease Detection in Leukemia\",\"authors\":\"Erell Gachon, Jérémie Bigot, Elsa Cazelles, Audrey Bidet, Jean-Philippe Vial, Pierre-Yves Dumas, Aguirre Mimoun\",\"doi\":\"10.1002/cyto.a.24918\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Representing and quantifying Measurable Residual Disease (MRD) in Acute Myeloid Leukemia (AML), a type of cancer that affects the blood and bone marrow, is essential in the prognosis and follow-up of AML patients. As traditional cytological analysis cannot detect leukemia cells below 5%, the analysis of flow cytometry datasets is expected to provide more reliable results. In this paper, we explore statistical learning methods based on optimal transport (OT) to achieve a relevant low-dimensional representation of multi-patient flow cytometry measurements (FCM) datasets considered as high-dimensional probability distributions. Using the framework of OT, we justify the use of the <i>K</i>-means algorithm for dimensionality reduction of multiple large-scale point clouds through mean measure quantization by merging all the data into a single point cloud. After this quantization step, the visualization of the intra-and inter-patient FCM variability is carried out by embedding low-dimensional quantized probability measures into a linear space using either Wasserstein Principal Component Analysis (PCA) through linearized OT or log-ratio PCA of compositional data. Using a publicly available FCM dataset and a FCM dataset from Bordeaux University Hospital, we demonstrate the benefits of our approach over the popular kernel mean embedding technique for statistical learning from multiple high-dimensional probability distributions. We also highlight the usefulness of our methodology for low-dimensional projection and clustering patient measurements according to their level of MRD in AML from FCM. In particular, our OT-based approach allows a relevant and informative two-dimensional representation of the results of the FlowSom algorithm, a state-of-the-art method for the detection of MRD in AML using multi-patient FCM.</p>\",\"PeriodicalId\":11068,\"journal\":{\"name\":\"Cytometry Part A\",\"volume\":\"107 2\",\"pages\":\"126-139\"},\"PeriodicalIF\":2.5000,\"publicationDate\":\"2025-03-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/cyto.a.24918\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cytometry Part A\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/cyto.a.24918\",\"RegionNum\":4,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"BIOCHEMICAL RESEARCH METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cytometry Part A","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cyto.a.24918","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
摘要
急性髓性白血病(AML)是一种影响血液和骨髓的癌症,表征和量化可测量残留病(MRD)在AML患者的预后和随访中至关重要。由于传统的细胞学分析无法检测到5%以下的白血病细胞,流式细胞术数据集的分析有望提供更可靠的结果。在本文中,我们探索了基于最优传输(OT)的统计学习方法,以实现作为高维概率分布的多患者流式细胞术测量(FCM)数据集的相关低维表示。利用OT框架,我们证明了K-means算法通过将所有数据合并到单个点云中,通过均值度量量化来降低多个大规模点云的维数。在此量化步骤之后,通过使用瓦瑟斯坦主成分分析(Wasserstein Principal Component Analysis, PCA)或成分数据的对数比PCA,将低维量化概率测度嵌入线性空间,实现患者内部和患者之间FCM变异性的可视化。使用公开可用的FCM数据集和来自波尔多大学医院的FCM数据集,我们证明了我们的方法比流行的核均值嵌入技术在从多个高维概率分布中进行统计学习方面的优势。我们还强调了我们的方法在低维投影和根据急性髓性白血病中FCM的MRD水平对患者测量进行聚类的有效性。特别是,我们基于ot的方法允许FlowSom算法结果的相关和信息的二维表示,FlowSom算法是一种使用多患者流式细胞术检测AML MRD的最先进方法。
Low Dimensional Representation of Multi-Patient Flow Cytometry Datasets Using Optimal Transport for Measurable Residual Disease Detection in Leukemia
Representing and quantifying Measurable Residual Disease (MRD) in Acute Myeloid Leukemia (AML), a type of cancer that affects the blood and bone marrow, is essential in the prognosis and follow-up of AML patients. As traditional cytological analysis cannot detect leukemia cells below 5%, the analysis of flow cytometry datasets is expected to provide more reliable results. In this paper, we explore statistical learning methods based on optimal transport (OT) to achieve a relevant low-dimensional representation of multi-patient flow cytometry measurements (FCM) datasets considered as high-dimensional probability distributions. Using the framework of OT, we justify the use of the K-means algorithm for dimensionality reduction of multiple large-scale point clouds through mean measure quantization by merging all the data into a single point cloud. After this quantization step, the visualization of the intra-and inter-patient FCM variability is carried out by embedding low-dimensional quantized probability measures into a linear space using either Wasserstein Principal Component Analysis (PCA) through linearized OT or log-ratio PCA of compositional data. Using a publicly available FCM dataset and a FCM dataset from Bordeaux University Hospital, we demonstrate the benefits of our approach over the popular kernel mean embedding technique for statistical learning from multiple high-dimensional probability distributions. We also highlight the usefulness of our methodology for low-dimensional projection and clustering patient measurements according to their level of MRD in AML from FCM. In particular, our OT-based approach allows a relevant and informative two-dimensional representation of the results of the FlowSom algorithm, a state-of-the-art method for the detection of MRD in AML using multi-patient FCM.
期刊介绍:
Cytometry Part A, the journal of quantitative single-cell analysis, features original research reports and reviews of innovative scientific studies employing quantitative single-cell measurement, separation, manipulation, and modeling techniques, as well as original articles on mechanisms of molecular and cellular functions obtained by cytometry techniques.
The journal welcomes submissions from multiple research fields that fully embrace the study of the cytome:
Biomedical Instrumentation Engineering
Biophotonics
Bioinformatics
Cell Biology
Computational Biology
Data Science
Immunology
Parasitology
Microbiology
Neuroscience
Cancer
Stem Cells
Tissue Regeneration.