Stability of scRNA-Seq Analysis Workflows is Susceptible to Preprocessing and is Mitigated by Regularized or Supervised Approaches.

IF 1.7 4区 生物学 Q4 EVOLUTIONARY BIOLOGY
Arda Durmaz, Jacob G Scott
{"title":"Stability of scRNA-Seq Analysis Workflows is Susceptible to Preprocessing and is Mitigated by Regularized or Supervised Approaches.","authors":"Arda Durmaz,&nbsp;Jacob G Scott","doi":"10.1177/11769343221123050","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Statistical methods developed to address various questions in single-cell datasets show increased variability to different parameter regimes. In order to delineate further the robustness of commonly utilized methods for single-cell RNA-Seq, we aimed to comprehensively review scRNA-Seq analysis workflows in the setting of dimension reduction, clustering, and trajectory inference.</p><p><strong>Methods: </strong>We utilized datasets with temporal single-cell transcriptomics profiles from public repositories. Combining multiple methods at each level of the workflow, we have performed over 6<i>k</i> analysis and evaluated the results of clustering and pseudotime estimation using adjusted rand index and rank correlation metrics. We have further integrated neural network methods to assess whether models with increased complexity can show increased bias/variance trade-off.</p><p><strong>Results: </strong>Combinatorial workflows showed that utilizing non-linear dimension reduction techniques such as t-SNE and UMAP are sensitive to initial preprocessing steps hence clustering results on dimension reduced space of single-cell datasets should be utilized carefully. Similarly, pseudotime estimation methods that depend on previous non-linear dimension reduction steps can result in highly variable trajectories. In contrast, methods that avoid non-linearity such as WOT can result in repeatable inferences of temporal gene expression dynamics. Furthermore, imputation methods do not improve clustering or trajectory inference results substantially in terms of repeatability. In contrast, the selection of the normalization method shows an increased effect on downstream analysis where ScTransform reduces variability overall.</p>","PeriodicalId":50472,"journal":{"name":"Evolutionary Bioinformatics","volume":"18 ","pages":"11769343221123050"},"PeriodicalIF":1.7000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/07/96/10.1177_11769343221123050.PMC9527995.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Evolutionary Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1177/11769343221123050","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"EVOLUTIONARY BIOLOGY","Score":null,"Total":0}
引用次数: 1

Abstract

Background: Statistical methods developed to address various questions in single-cell datasets show increased variability to different parameter regimes. In order to delineate further the robustness of commonly utilized methods for single-cell RNA-Seq, we aimed to comprehensively review scRNA-Seq analysis workflows in the setting of dimension reduction, clustering, and trajectory inference.

Methods: We utilized datasets with temporal single-cell transcriptomics profiles from public repositories. Combining multiple methods at each level of the workflow, we have performed over 6k analysis and evaluated the results of clustering and pseudotime estimation using adjusted rand index and rank correlation metrics. We have further integrated neural network methods to assess whether models with increased complexity can show increased bias/variance trade-off.

Results: Combinatorial workflows showed that utilizing non-linear dimension reduction techniques such as t-SNE and UMAP are sensitive to initial preprocessing steps hence clustering results on dimension reduced space of single-cell datasets should be utilized carefully. Similarly, pseudotime estimation methods that depend on previous non-linear dimension reduction steps can result in highly variable trajectories. In contrast, methods that avoid non-linearity such as WOT can result in repeatable inferences of temporal gene expression dynamics. Furthermore, imputation methods do not improve clustering or trajectory inference results substantially in terms of repeatability. In contrast, the selection of the normalization method shows an increased effect on downstream analysis where ScTransform reduces variability overall.

Abstract Image

Abstract Image

Abstract Image

scRNA-Seq分析工作流程的稳定性容易受到预处理的影响,并且可以通过正则化或监督方法来降低稳定性。
背景:用于解决单细胞数据集中各种问题的统计方法显示,不同参数制度的可变性增加。为了进一步描述单细胞RNA-Seq常用方法的鲁棒性,我们旨在全面回顾在降维、聚类和轨迹推断方面的scRNA-Seq分析工作流程。方法:我们利用来自公共数据库的单细胞转录组学数据集。在工作流程的每个级别上结合多种方法,我们已经执行了超过6k的分析,并使用调整后的rand指数和秩相关指标评估聚类和伪时间估计的结果。我们进一步集成了神经网络方法来评估复杂性增加的模型是否会显示出增加的偏差/方差权衡。结果:组合工作流表明,利用非线性降维技术(如t-SNE和UMAP)对初始预处理步骤敏感,因此应谨慎利用单细胞数据集降维空间上的聚类结果。类似地,依赖于先前非线性降维步骤的伪时间估计方法可能导致高度可变的轨迹。相比之下,避免非线性的方法,如WOT,可以导致时间基因表达动态的可重复推断。此外,在可重复性方面,imputation方法并不能显著提高聚类或轨迹推断结果。相比之下,规范化方法的选择在下游分析中显示出更大的影响,其中ScTransform总体上减少了可变性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Evolutionary Bioinformatics
Evolutionary Bioinformatics 生物-进化生物学
CiteScore
4.20
自引率
0.00%
发文量
25
审稿时长
12 months
期刊介绍: Evolutionary Bioinformatics is an open access, peer reviewed international journal focusing on evolutionary bioinformatics. The journal aims to support understanding of organismal form and function through use of molecular, genetic, genomic and proteomic data by giving due consideration to its evolutionary context.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信