单细胞RNA测序数据缺失值恢复

Wenjuan Zhang, William Yang, J. Talburt, S. Weissman, Mary Q. Yang
{"title":"单细胞RNA测序数据缺失值恢复","authors":"Wenjuan Zhang, William Yang, J. Talburt, S. Weissman, Mary Q. Yang","doi":"10.1109/CSCI54926.2021.00129","DOIUrl":null,"url":null,"abstract":"The emergence of single-cell sequencing technologies has enabled the production of high-resolution data at the individual cell level, providing unprecedented opportunities to capture cell population diversity and dissect the cellular heterogeneity of complex diseases. At the same time, relatively high biological and technical noise poses new challenges for single-cell data analysis. The single-cell RNA sequencing (scRNA-seq) data often contains substantial missing values due to gene dropout events. Here, we developed a convolutional neural network based model to recover missing values for scRNA-seq data. We first calculated the probability of dropout employing gamma-normal expectation maximum algorithm. Unlike most existing approaches, our model only recovered the expression values that have a dropout probability larger than a threshold. The mean square error and Pearson correlation coefficient were used to assess the accuracy of predicted expression values. The purity and entropy were computed to measure the homogeneity of cell clusters using imputed gene expression profiles. Across various scRNAseq datasets, our model demonstrated robust performance and achieved comparable or better results compared to the other imputation methods.","PeriodicalId":206881,"journal":{"name":"2021 International Conference on Computational Science and Computational Intelligence (CSCI)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Missing Value Recovery for Single Cell RNA Sequencing Data\",\"authors\":\"Wenjuan Zhang, William Yang, J. Talburt, S. Weissman, Mary Q. Yang\",\"doi\":\"10.1109/CSCI54926.2021.00129\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The emergence of single-cell sequencing technologies has enabled the production of high-resolution data at the individual cell level, providing unprecedented opportunities to capture cell population diversity and dissect the cellular heterogeneity of complex diseases. At the same time, relatively high biological and technical noise poses new challenges for single-cell data analysis. The single-cell RNA sequencing (scRNA-seq) data often contains substantial missing values due to gene dropout events. Here, we developed a convolutional neural network based model to recover missing values for scRNA-seq data. We first calculated the probability of dropout employing gamma-normal expectation maximum algorithm. Unlike most existing approaches, our model only recovered the expression values that have a dropout probability larger than a threshold. The mean square error and Pearson correlation coefficient were used to assess the accuracy of predicted expression values. The purity and entropy were computed to measure the homogeneity of cell clusters using imputed gene expression profiles. Across various scRNAseq datasets, our model demonstrated robust performance and achieved comparable or better results compared to the other imputation methods.\",\"PeriodicalId\":206881,\"journal\":{\"name\":\"2021 International Conference on Computational Science and Computational Intelligence (CSCI)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 International Conference on Computational Science and Computational Intelligence (CSCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CSCI54926.2021.00129\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Computational Science and Computational Intelligence (CSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CSCI54926.2021.00129","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

单细胞测序技术的出现使得在单个细胞水平上产生高分辨率数据成为可能,为捕捉细胞群体多样性和剖析复杂疾病的细胞异质性提供了前所未有的机会。同时,较高的生物噪声和技术噪声对单细胞数据分析提出了新的挑战。单细胞RNA测序(scRNA-seq)数据通常由于基因脱落事件而包含大量缺失值。在这里,我们开发了一个基于卷积神经网络的模型来恢复scRNA-seq数据的缺失值。我们首先用伽玛正态期望最大值算法计算了辍学概率。与大多数现有的方法不同,我们的模型只恢复具有大于阈值的丢弃概率的表达式值。采用均方误差和Pearson相关系数评价预测值的准确性。通过计算纯度和熵,利用输入的基因表达谱来测量细胞簇的均匀性。在不同的scRNAseq数据集上,我们的模型表现出了稳健的性能,并且与其他估算方法相比取得了相当或更好的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Missing Value Recovery for Single Cell RNA Sequencing Data
The emergence of single-cell sequencing technologies has enabled the production of high-resolution data at the individual cell level, providing unprecedented opportunities to capture cell population diversity and dissect the cellular heterogeneity of complex diseases. At the same time, relatively high biological and technical noise poses new challenges for single-cell data analysis. The single-cell RNA sequencing (scRNA-seq) data often contains substantial missing values due to gene dropout events. Here, we developed a convolutional neural network based model to recover missing values for scRNA-seq data. We first calculated the probability of dropout employing gamma-normal expectation maximum algorithm. Unlike most existing approaches, our model only recovered the expression values that have a dropout probability larger than a threshold. The mean square error and Pearson correlation coefficient were used to assess the accuracy of predicted expression values. The purity and entropy were computed to measure the homogeneity of cell clusters using imputed gene expression profiles. Across various scRNAseq datasets, our model demonstrated robust performance and achieved comparable or better results compared to the other imputation methods.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信