Enhancing Single-Cell RNA-Seq Data Completeness With a Graph Learning Framework.

IF 3.4 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Snehalika Lall, Sumanta Ray, Sanghamitra Bandyopadhyay
{"title":"Enhancing Single-Cell RNA-Seq Data Completeness With a Graph Learning Framework.","authors":"Snehalika Lall, Sumanta Ray, Sanghamitra Bandyopadhyay","doi":"10.1109/TCBB.2024.3492384","DOIUrl":null,"url":null,"abstract":"<p><p>Single cell RNA sequencing (scRNA-seq) is a powerful tool to capture gene expression snapshots in individual cells. However, a low amount of RNA in the individual cells results in dropout events, which introduce huge zero counts in the single cell expression matrix. We have developed VAImpute, a variational graph autoencoder based imputation technique that learns the inherent distribution of a large network/graph constructed from the scRNA-seq data leveraging copula correlation ($Ccor$) among cells/genes. The trained model is utilized to predict the dropouts events by computing the probability of all non-edges (cell-gene) in the network. We devise an algorithm to impute the missing expression values of the detected dropouts. The performance of the proposed model is assessed on both simulated and real scRNA-seq datasets, comparing it to established single-cell imputation methods. VAImpute yields significant improvements to detect dropouts, thereby achieving superior performance in cell clustering, detecting rare cells, and differential expression.</p>","PeriodicalId":13344,"journal":{"name":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","volume":"PP ","pages":"64-72"},"PeriodicalIF":3.4000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE/ACM Transactions on Computational Biology and Bioinformatics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1109/TCBB.2024.3492384","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Single cell RNA sequencing (scRNA-seq) is a powerful tool to capture gene expression snapshots in individual cells. However, a low amount of RNA in the individual cells results in dropout events, which introduce huge zero counts in the single cell expression matrix. We have developed VAImpute, a variational graph autoencoder based imputation technique that learns the inherent distribution of a large network/graph constructed from the scRNA-seq data leveraging copula correlation ($Ccor$) among cells/genes. The trained model is utilized to predict the dropouts events by computing the probability of all non-edges (cell-gene) in the network. We devise an algorithm to impute the missing expression values of the detected dropouts. The performance of the proposed model is assessed on both simulated and real scRNA-seq datasets, comparing it to established single-cell imputation methods. VAImpute yields significant improvements to detect dropouts, thereby achieving superior performance in cell clustering, detecting rare cells, and differential expression.

利用图形学习框架提高单细胞 RNA-seq 数据的完整性。
单细胞 RNA 测序(scRNA-seq)是捕捉单个细胞基因表达快照的强大工具。然而,由于单个细胞中的 RNA 含量较低,因此会出现丢失事件,从而在单细胞表达矩阵中引入大量零计数。我们开发的 VAImpute 是一种基于变异图自动编码器的估算技术,它利用细胞/基因间的 copula correlation ( Ccor) 学习由 scRNA-seq 数据构建的大型网络/图的固有分布。通过计算网络中所有非边(细胞-基因)的概率,利用训练好的模型预测掉线事件。我们还设计了一种算法,对检测到的缺失表达值进行补偿。我们在模拟和真实的 scRNA-seq 数据集上评估了拟议模型的性能,并将其与已有的单细胞估算方法进行了比较。VAImpute 在检测缺失方面有显著改进,因此在细胞聚类、检测稀有细胞和差异表达方面表现出色。所有代码和数据集都在 github 链接中提供:https://github.com/sumantaray/VAImputeAvailability。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.50
自引率
6.70%
发文量
479
审稿时长
3 months
期刊介绍: IEEE/ACM Transactions on Computational Biology and Bioinformatics emphasizes the algorithmic, mathematical, statistical and computational methods that are central in bioinformatics and computational biology; the development and testing of effective computer programs in bioinformatics; the development of biological databases; and important biological results that are obtained from the use of these methods, programs and databases; the emerging field of Systems Biology, where many forms of data are used to create a computer-based model of a complex biological system
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信