{"title":"MuSARCyto: Multi-Head Self-Attention-Based Representation Learning for Unsupervised Clustering of Cytometry Data","authors":"Anubha Gupta, Ritika Hooda, Sachin Motwani, Dikshant Sagar, Priya Aggarwal, Vinayak Abrol, Ritu Gupta","doi":"10.1002/cyto.a.24956","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>Cytometry enables simultaneous assessment of individual cellular characteristics, offering vital insights for diagnosis, prognosis, and monitoring various human diseases. Despite its significance, the process of manual cell clustering, or gating, remains labor-intensive, tedious, and highly subjective, which restricts its broader application in both research and clinical settings. Although automated clustering solutions have been developed, manual gating continues to be the clinical gold standard, possibly due to the suboptimal performance of automated solutions. We hypothesize that their performance can be improved via an appropriate representation of data from the clustering point of view. To this end, this work presents a novel unsupervised deep learning (DL) architecture wherein an efficient cytometry data representation is learned that helps discover cluster assignments. Specifically, we propose <i>MuSARCyto</i>, a multi-head self-attention-based representation learning network (RN) for the unsupervised clustering of cytometry data, utilizing a fully-connected representation network backbone. To benchmark <i>MuSARCyto</i> against the state-of-the-art cytometry clustering methods, we propose a cluster evaluation metric adjudicator score (<span></span><math>\n \n <semantics>\n \n <mrow>\n \n <msub>\n \n <mi>Ad</mi>\n \n <mi>n</mi>\n </msub>\n </mrow>\n </semantics>\n </math>), which is an ensemble of prevalent cluster evaluation metrics. Extensive experimentation demonstrates the superior performance of <i>MuSARCyto</i> against the existing state-of-the-art cytometry clustering methods across six publicly available mass and flow cytometry datasets. The proposed DL achitectures are small and easily deployable for clinical settings. This work further suggests using DL methods for identifying meaningful clusters, particularly in the context of critical immunology applications.</p>\n </div>","PeriodicalId":11068,"journal":{"name":"Cytometry Part A","volume":"107 8","pages":"551-567"},"PeriodicalIF":2.1000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cytometry Part A","FirstCategoryId":"99","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/cyto.a.24956","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Cytometry enables simultaneous assessment of individual cellular characteristics, offering vital insights for diagnosis, prognosis, and monitoring various human diseases. Despite its significance, the process of manual cell clustering, or gating, remains labor-intensive, tedious, and highly subjective, which restricts its broader application in both research and clinical settings. Although automated clustering solutions have been developed, manual gating continues to be the clinical gold standard, possibly due to the suboptimal performance of automated solutions. We hypothesize that their performance can be improved via an appropriate representation of data from the clustering point of view. To this end, this work presents a novel unsupervised deep learning (DL) architecture wherein an efficient cytometry data representation is learned that helps discover cluster assignments. Specifically, we propose MuSARCyto, a multi-head self-attention-based representation learning network (RN) for the unsupervised clustering of cytometry data, utilizing a fully-connected representation network backbone. To benchmark MuSARCyto against the state-of-the-art cytometry clustering methods, we propose a cluster evaluation metric adjudicator score (), which is an ensemble of prevalent cluster evaluation metrics. Extensive experimentation demonstrates the superior performance of MuSARCyto against the existing state-of-the-art cytometry clustering methods across six publicly available mass and flow cytometry datasets. The proposed DL achitectures are small and easily deployable for clinical settings. This work further suggests using DL methods for identifying meaningful clusters, particularly in the context of critical immunology applications.
细胞术能够同时评估单个细胞特征,为诊断、预后和监测各种人类疾病提供重要见解。尽管具有重要意义,但人工细胞聚类或门控的过程仍然是劳动密集型的,繁琐的,高度主观的,这限制了其在研究和临床环境中的广泛应用。尽管已经开发了自动化集群解决方案,但手动门控仍然是临床的黄金标准,这可能是由于自动化解决方案的性能不够理想。我们假设,从聚类的角度来看,它们的性能可以通过适当的数据表示来提高。为此,本工作提出了一种新的无监督深度学习(DL)架构,其中学习了有效的细胞计数数据表示,有助于发现聚类分配。具体来说,我们提出了MuSARCyto,这是一个基于多头自我注意的表示学习网络(RN),用于细胞计数数据的无监督聚类,利用一个完全连接的表示网络骨干。为了将MuSARCyto与最先进的细胞术聚类方法进行比较,我们提出了一个聚类评价指标裁判评分(Ad n $$ {\mathrm{Ad}}_n $$),这是一个流行的聚类评价指标的集合。广泛的实验证明了MuSARCyto在六个公开可用的质量和流式细胞术数据集上对现有最先进的细胞术聚类方法的优越性能。所提出的深度学习架构很小,并且易于在临床环境中部署。这项工作进一步建议使用DL方法来识别有意义的集群,特别是在关键免疫学应用的背景下。
期刊介绍:
Cytometry Part A, the journal of quantitative single-cell analysis, features original research reports and reviews of innovative scientific studies employing quantitative single-cell measurement, separation, manipulation, and modeling techniques, as well as original articles on mechanisms of molecular and cellular functions obtained by cytometry techniques.
The journal welcomes submissions from multiple research fields that fully embrace the study of the cytome:
Biomedical Instrumentation Engineering
Biophotonics
Bioinformatics
Cell Biology
Computational Biology
Data Science
Immunology
Parasitology
Microbiology
Neuroscience
Cancer
Stem Cells
Tissue Regeneration.