{"title":"Using Multi-Encoder Semi-Implicit Graph Variational Autoencoder to Analyze Single-Cell RNA Sequencing Data","authors":"Shengwen Tian;Cunmei Ji;Jiancheng Ni;Yutian Wang;Chunhou Zheng","doi":"10.1109/TCBB.2024.3458170","DOIUrl":null,"url":null,"abstract":"Rapid advances in single-cell RNA sequencing (scRNA-seq) have made it possible to characterize cell states at a high resolution view for large scale library. scRNA-seq data contains a great deal of biological information, which can be mainly used to discover cell subtypes and track cell development. However, traditional methods face many challenges in addressing scRNA-seq data with high dimensions and high sparsity. For better analysis of scRNA-seq data, we propose a new framework called MSVGAE based on variational graph auto-encoder and graph attention networks. Specifically, we introduce multiple encoders to learn features at different scales and control for uninformative features. Moreover, different noises are added to encoders to promote the propagation of graph structural information and distribution uncertainty. Therefore, some complex posterior distributions can be captured by our model. MSVGAE maps scRNA-seq data with high dimensions and high noise into the low-dimensional latent space, which is beneficial for downstream tasks. In particular, MSVGAE can handle extremely sparse data. Before the experiment, we create 24 simulated datasets to simulate various biological scenarios and collect 8 real-world datasets. The experimental results of clustering, visualization and marker genes analysis indicate that MSVGAE model has excellent accuracy and robustness in analyzing scRNA-seq data.","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"21 6","pages":"2280-2291"},"PeriodicalIF":3.6000,"publicationDate":"2024-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10675446/","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Rapid advances in single-cell RNA sequencing (scRNA-seq) have made it possible to characterize cell states at a high resolution view for large scale library. scRNA-seq data contains a great deal of biological information, which can be mainly used to discover cell subtypes and track cell development. However, traditional methods face many challenges in addressing scRNA-seq data with high dimensions and high sparsity. For better analysis of scRNA-seq data, we propose a new framework called MSVGAE based on variational graph auto-encoder and graph attention networks. Specifically, we introduce multiple encoders to learn features at different scales and control for uninformative features. Moreover, different noises are added to encoders to promote the propagation of graph structural information and distribution uncertainty. Therefore, some complex posterior distributions can be captured by our model. MSVGAE maps scRNA-seq data with high dimensions and high noise into the low-dimensional latent space, which is beneficial for downstream tasks. In particular, MSVGAE can handle extremely sparse data. Before the experiment, we create 24 simulated datasets to simulate various biological scenarios and collect 8 real-world datasets. The experimental results of clustering, visualization and marker genes analysis indicate that MSVGAE model has excellent accuracy and robustness in analyzing scRNA-seq data.
期刊介绍:
IEEE/ACM Transactions on Computational Biology and Bioinformatics emphasizes the algorithmic, mathematical, statistical and computational methods that are central in bioinformatics and computational biology; the development and testing of effective computer programs in bioinformatics; the development of biological databases; and important biological results that are obtained from the use of these methods, programs and databases; the emerging field of Systems Biology, where many forms of data are used to create a computer-based model of a complex biological system