Siyuan Luo, Pierre-Luc Germain, Ferdinand von Meyenn, Mark D Robinson
{"title":"On metrics for subpopulation detection in single-cell and spatial omics data","authors":"Siyuan Luo, Pierre-Luc Germain, Ferdinand von Meyenn, Mark D Robinson","doi":"10.1093/nar/gkaf921","DOIUrl":null,"url":null,"abstract":"Benchmarks are crucial to understanding the strengths and weaknesses of the growing number of tools for single-cell and spatial omics analysis. A key task is to distinguish subpopulations within complex tissues, where evaluation typically relies on external clustering validation metrics. Different metrics often lead to inconsistencies between rankings, highlighting the importance of understanding the behavior and biological implications of each metric. In this work, we provide a framework for systematically understanding and selecting validation metrics for single-cell data analysis, addressing tasks such as creating cell embeddings, constructing graphs, clustering, and spatial domain detection. Our discussion centers on the desirable properties of metrics, focusing on biological relevance and potential biases. Using this framework, we not only analyze existing metrics but also develop novel ones. Delving into domain detection in spatial omics data, we develop new external metrics tailored to spatially aware measurements. Additionally, a Bioconductor R package, poem, implements all the metrics discussed. While we focus on single-cell omics, much of the discussion is of broader relevance to other types of high-dimensional data.","PeriodicalId":19471,"journal":{"name":"Nucleic Acids Research","volume":"28 1","pages":""},"PeriodicalIF":13.1000,"publicationDate":"2025-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nucleic Acids Research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/nar/gkaf921","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Benchmarks are crucial to understanding the strengths and weaknesses of the growing number of tools for single-cell and spatial omics analysis. A key task is to distinguish subpopulations within complex tissues, where evaluation typically relies on external clustering validation metrics. Different metrics often lead to inconsistencies between rankings, highlighting the importance of understanding the behavior and biological implications of each metric. In this work, we provide a framework for systematically understanding and selecting validation metrics for single-cell data analysis, addressing tasks such as creating cell embeddings, constructing graphs, clustering, and spatial domain detection. Our discussion centers on the desirable properties of metrics, focusing on biological relevance and potential biases. Using this framework, we not only analyze existing metrics but also develop novel ones. Delving into domain detection in spatial omics data, we develop new external metrics tailored to spatially aware measurements. Additionally, a Bioconductor R package, poem, implements all the metrics discussed. While we focus on single-cell omics, much of the discussion is of broader relevance to other types of high-dimensional data.
期刊介绍:
Nucleic Acids Research (NAR) is a scientific journal that publishes research on various aspects of nucleic acids and proteins involved in nucleic acid metabolism and interactions. It covers areas such as chemistry and synthetic biology, computational biology, gene regulation, chromatin and epigenetics, genome integrity, repair and replication, genomics, molecular biology, nucleic acid enzymes, RNA, and structural biology. The journal also includes a Survey and Summary section for brief reviews. Additionally, each year, the first issue is dedicated to biological databases, and an issue in July focuses on web-based software resources for the biological community. Nucleic Acids Research is indexed by several services including Abstracts on Hygiene and Communicable Diseases, Animal Breeding Abstracts, Agricultural Engineering Abstracts, Agbiotech News and Information, BIOSIS Previews, CAB Abstracts, and EMBASE.