转录组学和蛋白质组学数据的单细胞聚类算法的比较基准

IF 10.1 1区 生物学 Q1 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Yu-Hang Yin, Fang Wang, Wei Li, Qiaoming Liu, Shengming Zhou, Murong Zhou, Zhongjun Jiang, Dong-Jun Yu, Guohua Wang
{"title":"转录组学和蛋白质组学数据的单细胞聚类算法的比较基准","authors":"Yu-Hang Yin, Fang Wang, Wei Li, Qiaoming Liu, Shengming Zhou, Murong Zhou, Zhongjun Jiang, Dong-Jun Yu, Guohua Wang","doi":"10.1186/s13059-025-03719-y","DOIUrl":null,"url":null,"abstract":"Differences in data distribution, feature dimensions, and quality between different single-cell modalities pose challenges for clustering. Although clustering algorithms have been developed for single-cell transcriptomic or proteomic data, their performance across different omics data types and integration scenarios remains poorly investigated, which limits the selection of methods and future method development. In this study, we conduct a systematic and comparative benchmark analysis of 28 computational algorithms on 10 paired transcriptomic and proteomic datasets, evaluating their performance across various metrics in terms of clustering, peak memory, and running time. We also discuss the impact of highly variable genes (HVGs) and cell type granularity on clustering performance. Additionally, the robustness of these clustering methods on two kinds of omics is evaluating by using 30 simulated datasets. Furthermore, to explore the benefits of integrating omics information for clustering tasks, we integrate single-cell transcriptomic and proteomic data using 7 state-of-the-art integration methods and assess the performance of existing single-omics clustering schemes on the integrated features. Our findings reveal modality-specific strengths and limitations, highlight the complementary nature of existing methods, and provide actionable insights to guide the selection of appropriate clustering approaches for specific scenarios. Overall, for top performance across two omics, consider scAIDE, scDCC, and FlowSOM, with FlowSOM also offering excellent robustness. For users prioritizing memory efficiency scDCC and scDeepCluster are recommended, while TSCAN, SHARP, and MarkovHC are recommended for users who prioritize time efficiency, and community detection-based methods offer a balance.","PeriodicalId":12611,"journal":{"name":"Genome Biology","volume":"28 1","pages":""},"PeriodicalIF":10.1000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative benchmarking of single-cell clustering algorithms for transcriptomic and proteomic data\",\"authors\":\"Yu-Hang Yin, Fang Wang, Wei Li, Qiaoming Liu, Shengming Zhou, Murong Zhou, Zhongjun Jiang, Dong-Jun Yu, Guohua Wang\",\"doi\":\"10.1186/s13059-025-03719-y\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Differences in data distribution, feature dimensions, and quality between different single-cell modalities pose challenges for clustering. Although clustering algorithms have been developed for single-cell transcriptomic or proteomic data, their performance across different omics data types and integration scenarios remains poorly investigated, which limits the selection of methods and future method development. In this study, we conduct a systematic and comparative benchmark analysis of 28 computational algorithms on 10 paired transcriptomic and proteomic datasets, evaluating their performance across various metrics in terms of clustering, peak memory, and running time. We also discuss the impact of highly variable genes (HVGs) and cell type granularity on clustering performance. Additionally, the robustness of these clustering methods on two kinds of omics is evaluating by using 30 simulated datasets. Furthermore, to explore the benefits of integrating omics information for clustering tasks, we integrate single-cell transcriptomic and proteomic data using 7 state-of-the-art integration methods and assess the performance of existing single-omics clustering schemes on the integrated features. Our findings reveal modality-specific strengths and limitations, highlight the complementary nature of existing methods, and provide actionable insights to guide the selection of appropriate clustering approaches for specific scenarios. Overall, for top performance across two omics, consider scAIDE, scDCC, and FlowSOM, with FlowSOM also offering excellent robustness. For users prioritizing memory efficiency scDCC and scDeepCluster are recommended, while TSCAN, SHARP, and MarkovHC are recommended for users who prioritize time efficiency, and community detection-based methods offer a balance.\",\"PeriodicalId\":12611,\"journal\":{\"name\":\"Genome Biology\",\"volume\":\"28 1\",\"pages\":\"\"},\"PeriodicalIF\":10.1000,\"publicationDate\":\"2025-09-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Genome Biology\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s13059-025-03719-y\",\"RegionNum\":1,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOTECHNOLOGY & APPLIED MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome Biology","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s13059-025-03719-y","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

不同单细胞模式之间的数据分布、特征维度和质量的差异给聚类带来了挑战。尽管针对单细胞转录组学或蛋白质组学数据的聚类算法已经开发出来,但它们在不同组学数据类型和整合场景中的性能研究仍然很少,这限制了方法的选择和未来方法的发展。在这项研究中,我们对28种计算算法在10对转录组学和蛋白质组学数据集上进行了系统的比较基准分析,评估了它们在聚类、峰值内存和运行时间等各种指标上的性能。我们还讨论了高可变基因(hvg)和细胞类型粒度对聚类性能的影响。此外,利用30个模拟数据集对两种组学的聚类方法进行了鲁棒性评价。此外,为了探索整合组学信息用于聚类任务的好处,我们使用7种最先进的整合方法整合单细胞转录组学和蛋白质组学数据,并评估现有单组学聚类方案在集成特征上的性能。我们的研究结果揭示了模式特定的优势和局限性,突出了现有方法的互补性,并提供了可操作的见解,以指导针对特定场景选择适当的聚类方法。总的来说,为了在两个组中获得最佳性能,可以考虑scAIDE、scDCC和FlowSOM,其中FlowSOM也具有出色的稳健性。对于优先考虑内存效率的用户,建议使用scDCC和scDeepCluster,而对于优先考虑时间效率的用户,建议使用tcan、SHARP和MarkovHC,基于社区检测的方法提供了一个平衡。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparative benchmarking of single-cell clustering algorithms for transcriptomic and proteomic data
Differences in data distribution, feature dimensions, and quality between different single-cell modalities pose challenges for clustering. Although clustering algorithms have been developed for single-cell transcriptomic or proteomic data, their performance across different omics data types and integration scenarios remains poorly investigated, which limits the selection of methods and future method development. In this study, we conduct a systematic and comparative benchmark analysis of 28 computational algorithms on 10 paired transcriptomic and proteomic datasets, evaluating their performance across various metrics in terms of clustering, peak memory, and running time. We also discuss the impact of highly variable genes (HVGs) and cell type granularity on clustering performance. Additionally, the robustness of these clustering methods on two kinds of omics is evaluating by using 30 simulated datasets. Furthermore, to explore the benefits of integrating omics information for clustering tasks, we integrate single-cell transcriptomic and proteomic data using 7 state-of-the-art integration methods and assess the performance of existing single-omics clustering schemes on the integrated features. Our findings reveal modality-specific strengths and limitations, highlight the complementary nature of existing methods, and provide actionable insights to guide the selection of appropriate clustering approaches for specific scenarios. Overall, for top performance across two omics, consider scAIDE, scDCC, and FlowSOM, with FlowSOM also offering excellent robustness. For users prioritizing memory efficiency scDCC and scDeepCluster are recommended, while TSCAN, SHARP, and MarkovHC are recommended for users who prioritize time efficiency, and community detection-based methods offer a balance.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Genome Biology
Genome Biology Biochemistry, Genetics and Molecular Biology-Genetics
CiteScore
21.00
自引率
3.30%
发文量
241
审稿时长
2 months
期刊介绍: Genome Biology stands as a premier platform for exceptional research across all domains of biology and biomedicine, explored through a genomic and post-genomic lens. With an impressive impact factor of 12.3 (2022),* the journal secures its position as the 3rd-ranked research journal in the Genetics and Heredity category and the 2nd-ranked research journal in the Biotechnology and Applied Microbiology category by Thomson Reuters. Notably, Genome Biology holds the distinction of being the highest-ranked open-access journal in this category. Our dedicated team of highly trained in-house Editors collaborates closely with our esteemed Editorial Board of international experts, ensuring the journal remains on the forefront of scientific advances and community standards. Regular engagement with researchers at conferences and institute visits underscores our commitment to staying abreast of the latest developments in the field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信