scHNTL: single-cell RNA-seq data clustering augmented by high-order neighbors and triplet loss.

Hua Meng, Chuan Qin, Zhiguo Long
{"title":"scHNTL: single-cell RNA-seq data clustering augmented by high-order neighbors and triplet loss.","authors":"Hua Meng, Chuan Qin, Zhiguo Long","doi":"10.1093/bioinformatics/btaf044","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>The rapid development of single-cell RNA sequencing (scRNA-seq) has significantly advanced biomedical research. Clustering analysis, crucial for scRNA-seq data, faces challenges including data sparsity, high dimensionality, and variable gene expressions. Better low-dimensional embeddings for these complex data should maintain intrinsic information while making similar data close and dissimilar data distant. However, existing methods utilizing neural networks typically focus on minimizing reconstruction loss and maintaining similarity in embeddings of directly related cells, but fail to consider dissimilarity, thus lacking separability and limiting the performance of clustering.</p><p><strong>Results: </strong>We propose a novel clustering algorithm, called scHNTL (scRNA-seq data clustering augmented by high-order neighbors and triplet loss). It first constructs an auxiliary similarity graph and uses a Graph Attentional Autoencoder to learn initial embeddings of cells. Then it identifies similar and dissimilar cells by exploring high-order structures of the similarity graph and exploits a triplet loss of contrastive learning, to improve the embeddings in preserving structural information by separating dissimilar pairs. Finally, this improvement for embedding and the target of clustering are fused in a self-optimizing clustering framework to obtain the clusters. Experimental evaluations on 16 real-world datasets demonstrate the superiority of scHNTL in clustering over the state-of-the-arts single-cell clustering algorithms.</p><p><strong>Availability and implementation: </strong>Python implementation of scHNTL is available at Figshare (https://doi.org/10.6084/m9.figshare.27001090) and Github (https://github.com/SWJTU-ML/scHNTL-code).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: The rapid development of single-cell RNA sequencing (scRNA-seq) has significantly advanced biomedical research. Clustering analysis, crucial for scRNA-seq data, faces challenges including data sparsity, high dimensionality, and variable gene expressions. Better low-dimensional embeddings for these complex data should maintain intrinsic information while making similar data close and dissimilar data distant. However, existing methods utilizing neural networks typically focus on minimizing reconstruction loss and maintaining similarity in embeddings of directly related cells, but fail to consider dissimilarity, thus lacking separability and limiting the performance of clustering.

Results: We propose a novel clustering algorithm, called scHNTL (scRNA-seq data clustering augmented by high-order neighbors and triplet loss). It first constructs an auxiliary similarity graph and uses a Graph Attentional Autoencoder to learn initial embeddings of cells. Then it identifies similar and dissimilar cells by exploring high-order structures of the similarity graph and exploits a triplet loss of contrastive learning, to improve the embeddings in preserving structural information by separating dissimilar pairs. Finally, this improvement for embedding and the target of clustering are fused in a self-optimizing clustering framework to obtain the clusters. Experimental evaluations on 16 real-world datasets demonstrate the superiority of scHNTL in clustering over the state-of-the-arts single-cell clustering algorithms.

Availability and implementation: Python implementation of scHNTL is available at Figshare (https://doi.org/10.6084/m9.figshare.27001090) and Github (https://github.com/SWJTU-ML/scHNTL-code).

Supplementary information: Supplementary data are available at Bioinformatics online.

求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信