Soft graph clustering for single-cell RNA sequencing data.

IF 3.3 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Ping Xu, Pengfei Wang, Zhiyuan Ning, Meng Xiao, Min Wu, Yuanchun Zhou
{"title":"Soft graph clustering for single-cell RNA sequencing data.","authors":"Ping Xu, Pengfei Wang, Zhiyuan Ning, Meng Xiao, Min Wu, Yuanchun Zhou","doi":"10.1186/s12859-025-06231-z","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Clustering analysis is fundamental in single-cell RNA sequencing (scRNA-seq) data analysis for elucidating cellular heterogeneity and diversity. Recent graph-based scRNA-seq clustering methods, particularly graph neural networks (GNNs), have significantly improved in tackling the challenges of high-dimension, high-sparsity, and frequent dropout events that lead to ambiguous cell population boundaries. However, one major challenge for GNN-based methods is their reliance on hard graph constructions derived from similarity matrices. These constructions introduce difficulties when applied to scRNA-seq data due to: (i) The simplification of intercellular relationships into binary edges (0 or 1) by applying thresholds, which restricts the capture of continuous similarity features among cells and leads to significant information loss. (ii) The presence of significant inter-cluster connections within hard graphs, which can confuse GNN methods that rely heavily on graph structures, potentially causing erroneous message propagation and biased clustering outcomes.</p><p><strong>Results: </strong>To tackle these challenges, we introduce scSGC, a Soft Graph Clustering for single-cell RNA sequencing data, which aims to more accurately characterize continuous similarities among cells through non-binary edge weights, thereby mitigating the limitations of rigid data structures. The scSGC framework comprises three core components: (i) a zero-inflated negative binomial (ZINB)-based feature autoencoder designed to effectively handle the sparsity and dropout issues in scRNA-seq data; (ii) a dual-channel cut-informed soft graph embedding module, constructed through deep graph-cut information, capturing continuous similarities between cells while preserving the intrinsic data structures of scRNA-seq; and (iii) an optimal transport-based clustering optimization module, achieving optimal delineation of cell populations while maintaining high biological relevance.</p><p><strong>Conclusion: </strong>By integrating dual-channel cut-informed soft graph representation learning, a ZINB-based feature autoencoder, and optimal transport-driven clustering optimization, scSGC effectively overcomes the challenges associated with traditional hard graph constructions in GNN methods. Extensive experiments across ten datasets demonstrate that scSGC outperforms 13 state-of-the-art clustering models in clustering accuracy, cell type annotation, and computational efficiency. These results highlight its substantial potential to advance scRNA-seq data analysis and deepen our understanding of cellular heterogeneity.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"195"},"PeriodicalIF":3.3000,"publicationDate":"2025-07-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06231-z","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Clustering analysis is fundamental in single-cell RNA sequencing (scRNA-seq) data analysis for elucidating cellular heterogeneity and diversity. Recent graph-based scRNA-seq clustering methods, particularly graph neural networks (GNNs), have significantly improved in tackling the challenges of high-dimension, high-sparsity, and frequent dropout events that lead to ambiguous cell population boundaries. However, one major challenge for GNN-based methods is their reliance on hard graph constructions derived from similarity matrices. These constructions introduce difficulties when applied to scRNA-seq data due to: (i) The simplification of intercellular relationships into binary edges (0 or 1) by applying thresholds, which restricts the capture of continuous similarity features among cells and leads to significant information loss. (ii) The presence of significant inter-cluster connections within hard graphs, which can confuse GNN methods that rely heavily on graph structures, potentially causing erroneous message propagation and biased clustering outcomes.

Results: To tackle these challenges, we introduce scSGC, a Soft Graph Clustering for single-cell RNA sequencing data, which aims to more accurately characterize continuous similarities among cells through non-binary edge weights, thereby mitigating the limitations of rigid data structures. The scSGC framework comprises three core components: (i) a zero-inflated negative binomial (ZINB)-based feature autoencoder designed to effectively handle the sparsity and dropout issues in scRNA-seq data; (ii) a dual-channel cut-informed soft graph embedding module, constructed through deep graph-cut information, capturing continuous similarities between cells while preserving the intrinsic data structures of scRNA-seq; and (iii) an optimal transport-based clustering optimization module, achieving optimal delineation of cell populations while maintaining high biological relevance.

Conclusion: By integrating dual-channel cut-informed soft graph representation learning, a ZINB-based feature autoencoder, and optimal transport-driven clustering optimization, scSGC effectively overcomes the challenges associated with traditional hard graph constructions in GNN methods. Extensive experiments across ten datasets demonstrate that scSGC outperforms 13 state-of-the-art clustering models in clustering accuracy, cell type annotation, and computational efficiency. These results highlight its substantial potential to advance scRNA-seq data analysis and deepen our understanding of cellular heterogeneity.

单细胞RNA测序数据的软图聚类。
背景:聚类分析是单细胞RNA测序(scRNA-seq)数据分析的基础,用于阐明细胞异质性和多样性。最近基于图的scRNA-seq聚类方法,特别是图神经网络(gnn),在解决高维、高稀疏和频繁的辍学事件(导致细胞种群边界模糊)的挑战方面取得了显著进步。然而,基于gnn的方法的一个主要挑战是它们依赖于从相似矩阵派生的硬图结构。这些结构在应用于scRNA-seq数据时带来了困难,因为:(i)通过应用阈值将细胞间关系简化为二值边(0或1),这限制了细胞间连续相似性特征的捕获,并导致严重的信息丢失。(ii)硬图中存在显著的簇间连接,这可能会混淆严重依赖图结构的GNN方法,可能导致错误的消息传播和有偏差的聚类结果。结果:为了解决这些挑战,我们引入了scSGC,一种针对单细胞RNA测序数据的软图聚类,旨在通过非二进制边缘权重更准确地表征细胞之间的连续相似性,从而减轻刚性数据结构的局限性。scSGC框架包括三个核心组件:(i)基于零膨胀负二项(ZINB)的特征自编码器,旨在有效处理scRNA-seq数据的稀疏性和丢弃问题;(ii)双通道切割通知软图嵌入模块,通过深度图切割信息构建,在保留scRNA-seq固有数据结构的同时捕获细胞之间的连续相似性;(iii)基于转运的最佳聚类优化模块,在保持高生物学相关性的同时实现细胞群体的最佳描绘。结论:通过集成双通道cut-informed软图表示学习、基于zinb的特征自编码器和最优传输驱动的聚类优化,scSGC有效地克服了GNN方法中传统硬图构建的挑战。在10个数据集上进行的大量实验表明,scSGC在聚类精度、细胞类型注释和计算效率方面优于13种最先进的聚类模型。这些结果突出了其推进scRNA-seq数据分析的巨大潜力,并加深了我们对细胞异质性的理解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Bioinformatics
BMC Bioinformatics 生物-生化研究方法
CiteScore
5.70
自引率
3.30%
发文量
506
审稿时长
4.3 months
期刊介绍: BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信