{"title":"基于单细胞课程学习的深度图嵌入式聚类","authors":"Huifa Li, Jie Fu, Xinpeng Ling, Zhiyu Sun, Kuncan Wang, Zhili Chen","doi":"arxiv-2408.10511","DOIUrl":null,"url":null,"abstract":"The swift advancement of single-cell RNA sequencing (scRNA-seq) technologies\nenables the investigation of cellular-level tissue heterogeneity. Cell\nannotation significantly contributes to the extensive downstream analysis of\nscRNA-seq data. However, The analysis of scRNA-seq for biological inference\npresents challenges owing to its intricate and indeterminate data distribution,\ncharacterized by a substantial volume and a high frequency of dropout events.\nFurthermore, the quality of training samples varies greatly, and the\nperformance of the popular scRNA-seq data clustering solution GNN could be\nharmed by two types of low-quality training nodes: 1) nodes on the boundary; 2)\nnodes that contribute little additional information to the graph. To address\nthese problems, we propose a single-cell curriculum learning-based deep graph\nembedding clustering (scCLG). We first propose a Chebyshev graph convolutional\nautoencoder with multi-decoder (ChebAE) that combines three optimization\nobjectives corresponding to three decoders, including topology reconstruction\nloss of cell graphs, zero-inflated negative binomial (ZINB) loss, and\nclustering loss, to learn cell-cell topology representation. Meanwhile, we\nemploy a selective training strategy to train GNN based on the features and\nentropy of nodes and prune the difficult nodes based on the difficulty scores\nto keep the high-quality graph. Empirical results on a variety of gene\nexpression datasets show that our model outperforms state-of-the-art methods.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"30 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Single-cell Curriculum Learning-based Deep Graph Embedding Clustering\",\"authors\":\"Huifa Li, Jie Fu, Xinpeng Ling, Zhiyu Sun, Kuncan Wang, Zhili Chen\",\"doi\":\"arxiv-2408.10511\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The swift advancement of single-cell RNA sequencing (scRNA-seq) technologies\\nenables the investigation of cellular-level tissue heterogeneity. Cell\\nannotation significantly contributes to the extensive downstream analysis of\\nscRNA-seq data. However, The analysis of scRNA-seq for biological inference\\npresents challenges owing to its intricate and indeterminate data distribution,\\ncharacterized by a substantial volume and a high frequency of dropout events.\\nFurthermore, the quality of training samples varies greatly, and the\\nperformance of the popular scRNA-seq data clustering solution GNN could be\\nharmed by two types of low-quality training nodes: 1) nodes on the boundary; 2)\\nnodes that contribute little additional information to the graph. To address\\nthese problems, we propose a single-cell curriculum learning-based deep graph\\nembedding clustering (scCLG). We first propose a Chebyshev graph convolutional\\nautoencoder with multi-decoder (ChebAE) that combines three optimization\\nobjectives corresponding to three decoders, including topology reconstruction\\nloss of cell graphs, zero-inflated negative binomial (ZINB) loss, and\\nclustering loss, to learn cell-cell topology representation. Meanwhile, we\\nemploy a selective training strategy to train GNN based on the features and\\nentropy of nodes and prune the difficult nodes based on the difficulty scores\\nto keep the high-quality graph. Empirical results on a variety of gene\\nexpression datasets show that our model outperforms state-of-the-art methods.\",\"PeriodicalId\":501070,\"journal\":{\"name\":\"arXiv - QuanBio - Genomics\",\"volume\":\"30 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - QuanBio - Genomics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2408.10511\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.10511","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Single-cell Curriculum Learning-based Deep Graph Embedding Clustering
The swift advancement of single-cell RNA sequencing (scRNA-seq) technologies
enables the investigation of cellular-level tissue heterogeneity. Cell
annotation significantly contributes to the extensive downstream analysis of
scRNA-seq data. However, The analysis of scRNA-seq for biological inference
presents challenges owing to its intricate and indeterminate data distribution,
characterized by a substantial volume and a high frequency of dropout events.
Furthermore, the quality of training samples varies greatly, and the
performance of the popular scRNA-seq data clustering solution GNN could be
harmed by two types of low-quality training nodes: 1) nodes on the boundary; 2)
nodes that contribute little additional information to the graph. To address
these problems, we propose a single-cell curriculum learning-based deep graph
embedding clustering (scCLG). We first propose a Chebyshev graph convolutional
autoencoder with multi-decoder (ChebAE) that combines three optimization
objectives corresponding to three decoders, including topology reconstruction
loss of cell graphs, zero-inflated negative binomial (ZINB) loss, and
clustering loss, to learn cell-cell topology representation. Meanwhile, we
employ a selective training strategy to train GNN based on the features and
entropy of nodes and prune the difficult nodes based on the difficulty scores
to keep the high-quality graph. Empirical results on a variety of gene
expression datasets show that our model outperforms state-of-the-art methods.