Hang Zhang, Yi Zhang, Kai Ming Ting, Jie Zhang, Qiuran Zhao
{"title":"Kernel-bounded clustering for spatial transcriptomics enables scalable discovery of complex spatial domains","authors":"Hang Zhang, Yi Zhang, Kai Ming Ting, Jie Zhang, Qiuran Zhao","doi":"10.1101/gr.278983.124","DOIUrl":null,"url":null,"abstract":"Spatial transcriptomics are a collection of technologies that have enabled characterization of gene expression profiles and spatial information in tissue samples. Existing methods for clustering spatial transcriptomics data have primarily focused on data transformation techniques to represent the data suitably for subsequent clustering analysis, often using an existing clustering algorithm. These methods have limitations in handling complex data characteristics with varying densities, sizes, and shapes (in the transformed space on which clustering is performed), and they have high computational complexity, resulting in unsatisfactory clustering outcomes and slow execution time even with GPUs. Rather than focusing on data transformation techniques, we propose a new clustering algorithm called kernel-bounded clustering (KBC). It has two unique features: (1) It is the first clustering algorithm that employs a distributional kernel to recruit members of a cluster, enabling clusters of varying densities, sizes, and shapes to be discovered, and (2) it is a linear-time clustering algorithm that significantly enhances the speed of clustering analysis, enabling researchers to effectively handle large-scale spatial transcriptomics data sets. We show that (1) KBC works well with a simple data transformation technique called the Weisfeiler–Lehman scheme, and (2) a combination of KBC and the Weisfeiler–Lehman scheme produces good clustering outcomes, and it is faster and easier-to-use than many methods that employ existing clustering algorithms and data transformation techniques.","PeriodicalId":12678,"journal":{"name":"Genome research","volume":"40 1","pages":""},"PeriodicalIF":6.2000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Genome research","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1101/gr.278983.124","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Spatial transcriptomics are a collection of technologies that have enabled characterization of gene expression profiles and spatial information in tissue samples. Existing methods for clustering spatial transcriptomics data have primarily focused on data transformation techniques to represent the data suitably for subsequent clustering analysis, often using an existing clustering algorithm. These methods have limitations in handling complex data characteristics with varying densities, sizes, and shapes (in the transformed space on which clustering is performed), and they have high computational complexity, resulting in unsatisfactory clustering outcomes and slow execution time even with GPUs. Rather than focusing on data transformation techniques, we propose a new clustering algorithm called kernel-bounded clustering (KBC). It has two unique features: (1) It is the first clustering algorithm that employs a distributional kernel to recruit members of a cluster, enabling clusters of varying densities, sizes, and shapes to be discovered, and (2) it is a linear-time clustering algorithm that significantly enhances the speed of clustering analysis, enabling researchers to effectively handle large-scale spatial transcriptomics data sets. We show that (1) KBC works well with a simple data transformation technique called the Weisfeiler–Lehman scheme, and (2) a combination of KBC and the Weisfeiler–Lehman scheme produces good clustering outcomes, and it is faster and easier-to-use than many methods that employ existing clustering algorithms and data transformation techniques.
期刊介绍:
Launched in 1995, Genome Research is an international, continuously published, peer-reviewed journal that focuses on research that provides novel insights into the genome biology of all organisms, including advances in genomic medicine.
Among the topics considered by the journal are genome structure and function, comparative genomics, molecular evolution, genome-scale quantitative and population genetics, proteomics, epigenomics, and systems biology. The journal also features exciting gene discoveries and reports of cutting-edge computational biology and high-throughput methodologies.
New data in these areas are published as research papers, or methods and resource reports that provide novel information on technologies or tools that will be of interest to a broad readership. Complete data sets are presented electronically on the journal''s web site where appropriate. The journal also provides Reviews, Perspectives, and Insight/Outlook articles, which present commentary on the latest advances published both here and elsewhere, placing such progress in its broader biological context.