scGHSOM: Hierarchical clustering and visualization of single-cell and CRISPR data using growing hierarchical SOM

Shang-Jung Wen, Jia-Ming Chang, Fang Yu
{"title":"scGHSOM: Hierarchical clustering and visualization of single-cell and CRISPR data using growing hierarchical SOM","authors":"Shang-Jung Wen, Jia-Ming Chang, Fang Yu","doi":"arxiv-2407.16984","DOIUrl":null,"url":null,"abstract":"High-dimensional single-cell data poses significant challenges in identifying\nunderlying biological patterns due to the complexity and heterogeneity of\ncellular states. We propose a comprehensive gene-cell dependency visualization\nvia unsupervised clustering, Growing Hierarchical Self-Organizing Map (GHSOM),\nspecifically designed for analyzing high-dimensional single-cell data like\nsingle-cell sequencing and CRISPR screens. GHSOM is applied to cluster samples\nin a hierarchical structure such that the self-growth structure of clusters\nsatisfies the required variations between and within. We propose a novel\nSignificant Attributes Identification Algorithm to identify features that\ndistinguish clusters. This algorithm pinpoints attributes with minimal\nvariation within a cluster but substantial variation between clusters. These\nkey attributes can then be used for targeted data retrieval and downstream\nanalysis. Furthermore, we present two innovative visualization tools: Cluster\nFeature Map and Cluster Distribution Map. The Cluster Feature Map highlights\nthe distribution of specific features across the hierarchical structure of\nGHSOM clusters. This allows for rapid visual assessment of cluster uniqueness\nbased on chosen features. The Cluster Distribution Map depicts leaf clusters as\ncircles on the GHSOM grid, with circle size reflecting cluster data size and\ncolor customizable to visualize features like cell type or other attributes. We\napply our analysis to three single-cell datasets and one CRISPR dataset\n(cell-gene database) and evaluate clustering methods with internal and external\nCH and ARI scores. GHSOM performs well, being the best performer in internal\nevaluation (CH=4.2). In external evaluation, GHSOM has the third-best\nperformance of all methods.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"1 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-07-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2407.16984","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

High-dimensional single-cell data poses significant challenges in identifying underlying biological patterns due to the complexity and heterogeneity of cellular states. We propose a comprehensive gene-cell dependency visualization via unsupervised clustering, Growing Hierarchical Self-Organizing Map (GHSOM), specifically designed for analyzing high-dimensional single-cell data like single-cell sequencing and CRISPR screens. GHSOM is applied to cluster samples in a hierarchical structure such that the self-growth structure of clusters satisfies the required variations between and within. We propose a novel Significant Attributes Identification Algorithm to identify features that distinguish clusters. This algorithm pinpoints attributes with minimal variation within a cluster but substantial variation between clusters. These key attributes can then be used for targeted data retrieval and downstream analysis. Furthermore, we present two innovative visualization tools: Cluster Feature Map and Cluster Distribution Map. The Cluster Feature Map highlights the distribution of specific features across the hierarchical structure of GHSOM clusters. This allows for rapid visual assessment of cluster uniqueness based on chosen features. The Cluster Distribution Map depicts leaf clusters as circles on the GHSOM grid, with circle size reflecting cluster data size and color customizable to visualize features like cell type or other attributes. We apply our analysis to three single-cell datasets and one CRISPR dataset (cell-gene database) and evaluate clustering methods with internal and external CH and ARI scores. GHSOM performs well, being the best performer in internal evaluation (CH=4.2). In external evaluation, GHSOM has the third-best performance of all methods.
scGHSOM:利用生长分层 SOM 对单细胞和 CRISPR 数据进行分层聚类和可视化处理
由于细胞状态的复杂性和异质性,高维单细胞数据给识别潜在的生物模式带来了巨大挑战。我们提出了一种通过无监督聚类实现基因-细胞依赖关系可视化的综合方法--生长分层自组织图(GHSOM),专门用于分析单细胞测序和CRISPR筛选的高维单细胞数据。GHSOM 采用分层结构对样本进行聚类,这样聚类的自生长结构就能满足样本之间和样本内部的变化要求。我们提出了一种新颖的 "重要属性识别算法"(Significant Attributes Identification Algorithm)来识别区分聚类的特征。该算法能找出在聚类内部变化最小,但在聚类之间变化很大的属性。这些关键属性可用于有针对性的数据检索和下游分析。此外,我们还介绍了两种创新的可视化工具:聚类特征图(ClusterFeature Map)和聚类分布图(Cluster Distribution Map)。聚类特征图突出显示了特定特征在 GHSOM 聚类分层结构中的分布。这样就可以根据所选特征快速直观地评估聚类的独特性。簇分布图将叶簇描绘成 GHSOM 网格上的圆圈,圆圈大小反映了簇数据的大小,颜色可自定义,以直观显示细胞类型或其他属性等特征。我们将分析结果应用于三个单细胞数据集和一个 CRISPR 数据集(细胞基因数据库),并用内部、外部CH 和 ARI 分数评估聚类方法。GHSOM 表现出色,是内部评估中表现最好的方法(CH=4.2)。在外部评估中,GHSOM 的表现在所有方法中名列第三。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信