CHOIR improves significance-based detection of cell types and states from single-cell data

IF 31.7 1区 生物学 Q1 GENETICS & HEREDITY
Cathrine Sant, Lennart Mucke, M. Ryan Corces
{"title":"CHOIR improves significance-based detection of cell types and states from single-cell data","authors":"Cathrine Sant, Lennart Mucke, M. Ryan Corces","doi":"10.1038/s41588-025-02148-8","DOIUrl":null,"url":null,"abstract":"Clustering is a critical step in the analysis of single-cell data, enabling the discovery and characterization of cell types and states. However, most popular clustering tools do not subject results to statistical inference testing, leading to risks of overclustering or underclustering data and often resulting in ineffective identification of cell types with widely differing prevalence. To address these challenges, we present CHOIR (cluster hierarchy optimization by iterative random forests), which applies a framework of random forest classifiers and permutation tests across a hierarchical clustering tree to statistically determine clusters representing distinct populations. We demonstrate the performance of CHOIR through extensive benchmarking against 15 existing clustering methods across 230 simulated and five real single-cell RNA sequencing, assay for transposase-accessible chromatin sequencing, spatial transcriptomic and multi-omic datasets. CHOIR can be applied to any single-cell data type and provides a flexible, scalable and robust solution to the challenge of identifying biologically relevant cell groupings within heterogeneous single-cell data. Cluster hierarchy optimization by iterative random forests (CHOIR) offers a robust and accurate method to identify cell clusters across a variety of single-cell resolution data with statistical support.","PeriodicalId":18985,"journal":{"name":"Nature genetics","volume":"57 5","pages":"1309-1319"},"PeriodicalIF":31.7000,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nature genetics","FirstCategoryId":"99","ListUrlMain":"https://www.nature.com/articles/s41588-025-02148-8","RegionNum":1,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"GENETICS & HEREDITY","Score":null,"Total":0}
引用次数: 0

Abstract

Clustering is a critical step in the analysis of single-cell data, enabling the discovery and characterization of cell types and states. However, most popular clustering tools do not subject results to statistical inference testing, leading to risks of overclustering or underclustering data and often resulting in ineffective identification of cell types with widely differing prevalence. To address these challenges, we present CHOIR (cluster hierarchy optimization by iterative random forests), which applies a framework of random forest classifiers and permutation tests across a hierarchical clustering tree to statistically determine clusters representing distinct populations. We demonstrate the performance of CHOIR through extensive benchmarking against 15 existing clustering methods across 230 simulated and five real single-cell RNA sequencing, assay for transposase-accessible chromatin sequencing, spatial transcriptomic and multi-omic datasets. CHOIR can be applied to any single-cell data type and provides a flexible, scalable and robust solution to the challenge of identifying biologically relevant cell groupings within heterogeneous single-cell data. Cluster hierarchy optimization by iterative random forests (CHOIR) offers a robust and accurate method to identify cell clusters across a variety of single-cell resolution data with statistical support.

Abstract Image

CHOIR改进了基于单细胞数据的细胞类型和状态的显著性检测
聚类是分析单细胞数据的关键步骤,可以发现和表征细胞类型和状态。然而,大多数流行的聚类工具没有对结果进行统计推断测试,导致数据聚类过度或聚类不足的风险,并且经常导致对患病率差异很大的细胞类型的无效识别。为了解决这些挑战,我们提出了CHOIR(迭代随机森林的聚类层次优化),它应用随机森林分类器框架和跨分层聚类树的排列测试来统计地确定代表不同种群的聚类。我们通过在230个模拟和5个真实单细胞RNA测序、转座酶可及染色质测序、空间转录组和多组数据集上对15种现有聚类方法进行广泛的基准测试,证明了CHOIR的性能。CHOIR可以应用于任何单细胞数据类型,并提供了一个灵活的,可扩展的和强大的解决方案,以识别异质单细胞数据内生物学相关的细胞分组的挑战。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Nature genetics
Nature genetics 生物-遗传学
CiteScore
43.00
自引率
2.60%
发文量
241
审稿时长
3 months
期刊介绍: Nature Genetics publishes the very highest quality research in genetics. It encompasses genetic and functional genomic studies on human and plant traits and on other model organisms. Current emphasis is on the genetic basis for common and complex diseases and on the functional mechanism, architecture and evolution of gene networks, studied by experimental perturbation. Integrative genetic topics comprise, but are not limited to: -Genes in the pathology of human disease -Molecular analysis of simple and complex genetic traits -Cancer genetics -Agricultural genomics -Developmental genetics -Regulatory variation in gene expression -Strategies and technologies for extracting function from genomic data -Pharmacological genomics -Genome evolution
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信