Single-cell Curriculum Learning-based Deep Graph Embedding Clustering

arXiv - QuanBio - Genomics Pub Date : 2024-08-20 DOI:arxiv-2408.10511

Huifa Li, Jie Fu, Xinpeng Ling, Zhiyu Sun, Kuncan Wang, Zhili Chen

{"title":"Single-cell Curriculum Learning-based Deep Graph Embedding Clustering","authors":"Huifa Li, Jie Fu, Xinpeng Ling, Zhiyu Sun, Kuncan Wang, Zhili Chen","doi":"arxiv-2408.10511","DOIUrl":null,"url":null,"abstract":"The swift advancement of single-cell RNA sequencing (scRNA-seq) technologies\nenables the investigation of cellular-level tissue heterogeneity. Cell\nannotation significantly contributes to the extensive downstream analysis of\nscRNA-seq data. However, The analysis of scRNA-seq for biological inference\npresents challenges owing to its intricate and indeterminate data distribution,\ncharacterized by a substantial volume and a high frequency of dropout events.\nFurthermore, the quality of training samples varies greatly, and the\nperformance of the popular scRNA-seq data clustering solution GNN could be\nharmed by two types of low-quality training nodes: 1) nodes on the boundary; 2)\nnodes that contribute little additional information to the graph. To address\nthese problems, we propose a single-cell curriculum learning-based deep graph\nembedding clustering (scCLG). We first propose a Chebyshev graph convolutional\nautoencoder with multi-decoder (ChebAE) that combines three optimization\nobjectives corresponding to three decoders, including topology reconstruction\nloss of cell graphs, zero-inflated negative binomial (ZINB) loss, and\nclustering loss, to learn cell-cell topology representation. Meanwhile, we\nemploy a selective training strategy to train GNN based on the features and\nentropy of nodes and prune the difficult nodes based on the difficulty scores\nto keep the high-quality graph. Empirical results on a variety of gene\nexpression datasets show that our model outperforms state-of-the-art methods.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.10511","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The swift advancement of single-cell RNA sequencing (scRNA-seq) technologies enables the investigation of cellular-level tissue heterogeneity. Cell annotation significantly contributes to the extensive downstream analysis of scRNA-seq data. However, The analysis of scRNA-seq for biological inference presents challenges owing to its intricate and indeterminate data distribution, characterized by a substantial volume and a high frequency of dropout events. Furthermore, the quality of training samples varies greatly, and the performance of the popular scRNA-seq data clustering solution GNN could be harmed by two types of low-quality training nodes: 1) nodes on the boundary; 2) nodes that contribute little additional information to the graph. To address these problems, we propose a single-cell curriculum learning-based deep graph embedding clustering (scCLG). We first propose a Chebyshev graph convolutional autoencoder with multi-decoder (ChebAE) that combines three optimization objectives corresponding to three decoders, including topology reconstruction loss of cell graphs, zero-inflated negative binomial (ZINB) loss, and clustering loss, to learn cell-cell topology representation. Meanwhile, we employ a selective training strategy to train GNN based on the features and entropy of nodes and prune the difficult nodes based on the difficulty scores to keep the high-quality graph. Empirical results on a variety of gene expression datasets show that our model outperforms state-of-the-art methods.

查看原文本刊更多论文

基于单细胞课程学习的深度图嵌入式聚类

单细胞 RNA 测序（scRNA-seq）技术的迅猛发展使研究细胞级组织异质性成为可能。细胞注释大大有助于对 scRNA-seq 数据进行广泛的下游分析。此外，训练样本的质量参差不齐，流行的 scRNA-seq 数据聚类解决方案 GNN 的性能可能会受到两类低质量训练节点的影响：1）边界上的节点；2）对图贡献很少额外信息的节点。为了解决这些问题，我们提出了一种基于单细胞课程学习的深度图标聚类（sCLG）。我们首先提出了一种带多解码器的切比雪夫图卷积自动编码器（ChebAE），它结合了与三个解码器相对应的三个优化目标，包括细胞图拓扑重建损失、零膨胀负二项式（ZINB）损失和聚类损失，以学习细胞-细胞拓扑表示。同时，我们采用选择性训练策略，根据节点的特征和熵来训练 GNN，并根据难度评分来剪切困难的节点，以保持高质量的图。在各种基因表达数据集上的实证结果表明，我们的模型优于最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - QuanBio - Genomics

自引率

0.00%

发文量