{"title":"scCDCG: Efficient Deep Structural Clustering for single-cell RNA-seq via Deep Cut-informed Graph Embedding","authors":"Ping Xu, Zhiyuan Ning, Meng Xiao, Guihai Feng, Xin Li, Yuanchun Zhou, Pengfei Wang","doi":"arxiv-2404.06167","DOIUrl":null,"url":null,"abstract":"Single-cell RNA sequencing (scRNA-seq) is essential for unraveling cellular\nheterogeneity and diversity, offering invaluable insights for bioinformatics\nadvancements. Despite its potential, traditional clustering methods in\nscRNA-seq data analysis often neglect the structural information embedded in\ngene expression profiles, crucial for understanding cellular correlations and\ndependencies. Existing strategies, including graph neural networks, face\nchallenges in handling the inefficiency due to scRNA-seq data's intrinsic\nhigh-dimension and high-sparsity. Addressing these limitations, we introduce\nscCDCG (single-cell RNA-seq Clustering via Deep Cut-informed Graph), a novel\nframework designed for efficient and accurate clustering of scRNA-seq data that\nsimultaneously utilizes intercellular high-order structural information. scCDCG\ncomprises three main components: (i) A graph embedding module utilizing deep\ncut-informed techniques, which effectively captures intercellular high-order\nstructural information, overcoming the over-smoothing and inefficiency issues\nprevalent in prior graph neural network methods. (ii) A self-supervised\nlearning module guided by optimal transport, tailored to accommodate the unique\ncomplexities of scRNA-seq data, specifically its high-dimension and\nhigh-sparsity. (iii) An autoencoder-based feature learning module that\nsimplifies model complexity through effective dimension reduction and feature\nextraction. Our extensive experiments on 6 datasets demonstrate scCDCG's\nsuperior performance and efficiency compared to 7 established models,\nunderscoring scCDCG's potential as a transformative tool in scRNA-seq data\nanalysis. Our code is available at: https://github.com/XPgogogo/scCDCG.","PeriodicalId":501070,"journal":{"name":"arXiv - QuanBio - Genomics","volume":"6 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuanBio - Genomics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2404.06167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Single-cell RNA sequencing (scRNA-seq) is essential for unraveling cellular
heterogeneity and diversity, offering invaluable insights for bioinformatics
advancements. Despite its potential, traditional clustering methods in
scRNA-seq data analysis often neglect the structural information embedded in
gene expression profiles, crucial for understanding cellular correlations and
dependencies. Existing strategies, including graph neural networks, face
challenges in handling the inefficiency due to scRNA-seq data's intrinsic
high-dimension and high-sparsity. Addressing these limitations, we introduce
scCDCG (single-cell RNA-seq Clustering via Deep Cut-informed Graph), a novel
framework designed for efficient and accurate clustering of scRNA-seq data that
simultaneously utilizes intercellular high-order structural information. scCDCG
comprises three main components: (i) A graph embedding module utilizing deep
cut-informed techniques, which effectively captures intercellular high-order
structural information, overcoming the over-smoothing and inefficiency issues
prevalent in prior graph neural network methods. (ii) A self-supervised
learning module guided by optimal transport, tailored to accommodate the unique
complexities of scRNA-seq data, specifically its high-dimension and
high-sparsity. (iii) An autoencoder-based feature learning module that
simplifies model complexity through effective dimension reduction and feature
extraction. Our extensive experiments on 6 datasets demonstrate scCDCG's
superior performance and efficiency compared to 7 established models,
underscoring scCDCG's potential as a transformative tool in scRNA-seq data
analysis. Our code is available at: https://github.com/XPgogogo/scCDCG.