scGMM-VGAE: a Gaussian mixture model-based variational graph autoencoder algorithm for clustering single-cell RNA-seq data

IF 6.3 2区 物理与天体物理 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Eric W Lin, Boyuan Liu, L. Lac, Daryl L. X. Fung, C. Leung, P. Hu
{"title":"scGMM-VGAE: a Gaussian mixture model-based variational graph autoencoder algorithm for clustering single-cell RNA-seq data","authors":"Eric W Lin, Boyuan Liu, L. Lac, Daryl L. X. Fung, C. Leung, P. Hu","doi":"10.1088/2632-2153/acd7c3","DOIUrl":null,"url":null,"abstract":"Cell type identification using single-cell RNA sequencing data is critical for understanding disease mechanisms and drug discovery. Cell clustering analysis has been widely studied in health research for rare tumor cell detection. In this study, we propose a Gaussian mixture model-based variational graph autoencoder on scRNA-seq data (scGMM-VGAE) that integrates a statistical clustering model to a deep learning algorithm to significantly improve the cell clustering performance. This model feeds a cell-cell graph adjacency matrix and a gene feature matrix into a graph variational autoencoder (VGAE) to generate latent data. These data are then used for cell clustering by the Gaussian mixture model (GMM) module. To optimize the algorithm, a designed loss function is derived by combining parameter estimates from the GMM and VGAE. We test the proposed method on four publicly available and three simulated datasets which contain many biological and technical zeros. The scGMM-VGAE outperforms four selected baseline methods on three evaluation metrics in cell clustering. By successfully incorporating GMM into deep learning VGAE on scRNA-seq data, the proposed method shows higher accuracy in cell clustering on scRNA-seq data. This improvement has a significant impact on detecting rare cell types in health research. All source codes used in this study can be found at https://github.com/ericlin1230/scGMM-VGAE.","PeriodicalId":33757,"journal":{"name":"Machine Learning Science and Technology","volume":" ","pages":""},"PeriodicalIF":6.3000,"publicationDate":"2023-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning Science and Technology","FirstCategoryId":"101","ListUrlMain":"https://doi.org/10.1088/2632-2153/acd7c3","RegionNum":2,"RegionCategory":"物理与天体物理","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 2

Abstract

Cell type identification using single-cell RNA sequencing data is critical for understanding disease mechanisms and drug discovery. Cell clustering analysis has been widely studied in health research for rare tumor cell detection. In this study, we propose a Gaussian mixture model-based variational graph autoencoder on scRNA-seq data (scGMM-VGAE) that integrates a statistical clustering model to a deep learning algorithm to significantly improve the cell clustering performance. This model feeds a cell-cell graph adjacency matrix and a gene feature matrix into a graph variational autoencoder (VGAE) to generate latent data. These data are then used for cell clustering by the Gaussian mixture model (GMM) module. To optimize the algorithm, a designed loss function is derived by combining parameter estimates from the GMM and VGAE. We test the proposed method on four publicly available and three simulated datasets which contain many biological and technical zeros. The scGMM-VGAE outperforms four selected baseline methods on three evaluation metrics in cell clustering. By successfully incorporating GMM into deep learning VGAE on scRNA-seq data, the proposed method shows higher accuracy in cell clustering on scRNA-seq data. This improvement has a significant impact on detecting rare cell types in health research. All source codes used in this study can be found at https://github.com/ericlin1230/scGMM-VGAE.
scGMM-VGAE:一种基于高斯混合模型的单细胞RNA-seq数据聚类变分图自编码器算法
使用单细胞RNA测序数据进行细胞类型鉴定对于理解疾病机制和药物发现至关重要。细胞聚类分析在罕见肿瘤细胞检测的健康研究中得到了广泛的研究。在本研究中,我们提出了一种基于高斯混合模型的scRNA-seq数据变分图自动编码器(scGMM-VGAE),该编码器将统计聚类模型与深度学习算法相结合,以显著提高细胞聚类性能。该模型将细胞-细胞图邻接矩阵和基因特征矩阵输入到图变分自动编码器(VGAE)中以生成潜在数据。然后,这些数据被高斯混合模型(GMM)模块用于细胞聚类。为了优化算法,结合GMM和VGAE的参数估计,导出了设计的损失函数。我们在四个公开可用的数据集和三个模拟数据集上测试了所提出的方法,这些数据集包含许多生物学和技术零点。scGMM-VGAE在细胞聚类的三个评估指标上优于四种选定的基线方法。通过成功地将GMM结合到scRNA-seq数据的深度学习VGAE中,所提出的方法在scRNA-seq数据的细胞聚类中显示出更高的准确性。这一改进对健康研究中检测稀有细胞类型具有重大影响。本研究中使用的所有源代码均可在https://github.com/ericlin1230/scGMM-VGAE.
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Machine Learning Science and Technology
Machine Learning Science and Technology Computer Science-Artificial Intelligence
CiteScore
9.10
自引率
4.40%
发文量
86
审稿时长
5 weeks
期刊介绍: Machine Learning Science and Technology is a multidisciplinary open access journal that bridges the application of machine learning across the sciences with advances in machine learning methods and theory as motivated by physical insights. Specifically, articles must fall into one of the following categories: advance the state of machine learning-driven applications in the sciences or make conceptual, methodological or theoretical advances in machine learning with applications to, inspiration from, or motivated by scientific problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信