scHNTL: single-cell RNA-seq data clustering augmented by high-order neighbors and triplet loss.

Bioinformatics (Oxford, England) Pub Date : 2025-01-29 DOI:10.1093/bioinformatics/btaf044

Hua Meng, Chuan Qin, Zhiguo Long

{"title":"scHNTL: single-cell RNA-seq data clustering augmented by high-order neighbors and triplet loss.","authors":"Hua Meng, Chuan Qin, Zhiguo Long","doi":"10.1093/bioinformatics/btaf044","DOIUrl":null,"url":null,"abstract":"Motivation: The rapid development of single-cell RNA sequencing (scRNA-seq) has significantly advanced biomedical research. Clustering analysis, crucial for scRNA-seq data, faces challenges including data sparsity, high dimensionality, and variable gene expressions. Better low-dimensional embeddings for these complex data should maintain intrinsic information while making similar data close and dissimilar data distant. However, existing methods utilizing neural networks typically focus on minimizing reconstruction loss and maintaining similarity in embeddings of directly related cells, but fail to consider dissimilarity, thus lacking separability and limiting the performance of clustering.Results: We propose a novel clustering algorithm, called scHNTL (scRNA-seq data clustering augmented by high-order neighbors and triplet loss). It first constructs an auxiliary similarity graph and uses a Graph Attentional Autoencoder to learn initial embeddings of cells. Then it identifies similar and dissimilar cells by exploring high-order structures of the similarity graph and exploits a triplet loss of contrastive learning, to improve the embeddings in preserving structural information by separating dissimilar pairs. Finally, this improvement for embedding and the target of clustering are fused in a self-optimizing clustering framework to obtain the clusters. Experimental evaluations on 16 real-world datasets demonstrate the superiority of scHNTL in clustering over the state-of-the-arts single-cell clustering algorithms.Availability and implementation: Python implementation of scHNTL is available at Figshare (https://doi.org/10.6084/m9.figshare.27001090) and Github (https://github.com/SWJTU-ML/scHNTL-code).Supplementary information: Supplementary data are available at Bioinformatics online.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-01-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Motivation: The rapid development of single-cell RNA sequencing (scRNA-seq) has significantly advanced biomedical research. Clustering analysis, crucial for scRNA-seq data, faces challenges including data sparsity, high dimensionality, and variable gene expressions. Better low-dimensional embeddings for these complex data should maintain intrinsic information while making similar data close and dissimilar data distant. However, existing methods utilizing neural networks typically focus on minimizing reconstruction loss and maintaining similarity in embeddings of directly related cells, but fail to consider dissimilarity, thus lacking separability and limiting the performance of clustering.

Results: We propose a novel clustering algorithm, called scHNTL (scRNA-seq data clustering augmented by high-order neighbors and triplet loss). It first constructs an auxiliary similarity graph and uses a Graph Attentional Autoencoder to learn initial embeddings of cells. Then it identifies similar and dissimilar cells by exploring high-order structures of the similarity graph and exploits a triplet loss of contrastive learning, to improve the embeddings in preserving structural information by separating dissimilar pairs. Finally, this improvement for embedding and the target of clustering are fused in a self-optimizing clustering framework to obtain the clusters. Experimental evaluations on 16 real-world datasets demonstrate the superiority of scHNTL in clustering over the state-of-the-arts single-cell clustering algorithms.

Availability and implementation: Python implementation of scHNTL is available at Figshare (https://doi.org/10.6084/m9.figshare.27001090) and Github (https://github.com/SWJTU-ML/scHNTL-code).

Supplementary information: Supplementary data are available at Bioinformatics online.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

Bioinformatics (Oxford, England)

自引率

0.00%

发文量