A toolkit for enhanced reproducibility of RNASeq analysis for synthetic biologists.

IF 2.6 Q2 BIOCHEMICAL RESEARCH METHODS
Synthetic biology (Oxford, England) Pub Date : 2022-08-23 eCollection Date: 2022-01-01 DOI:10.1093/synbio/ysac012
Benjamin J Garcia, Joshua Urrutia, George Zheng, Diveena Becker, Carolyn Corbet, Paul Maschhoff, Alexander Cristofaro, Niall Gaffney, Matthew Vaughn, Uma Saxena, Yi-Pei Chen, D Benjamin Gordon, Mohammed Eslami
{"title":"A toolkit for enhanced reproducibility of RNASeq analysis for synthetic biologists.","authors":"Benjamin J Garcia,&nbsp;Joshua Urrutia,&nbsp;George Zheng,&nbsp;Diveena Becker,&nbsp;Carolyn Corbet,&nbsp;Paul Maschhoff,&nbsp;Alexander Cristofaro,&nbsp;Niall Gaffney,&nbsp;Matthew Vaughn,&nbsp;Uma Saxena,&nbsp;Yi-Pei Chen,&nbsp;D Benjamin Gordon,&nbsp;Mohammed Eslami","doi":"10.1093/synbio/ysac012","DOIUrl":null,"url":null,"abstract":"<p><p>Sequencing technologies, in particular RNASeq, have become critical tools in the design, build, test and learn cycle of synthetic biology. They provide a better understanding of synthetic designs, and they help identify ways to improve and select designs. While these data are beneficial to design, their collection and analysis is a complex, multistep process that has implications on both discovery and reproducibility of experiments. Additionally, tool parameters, experimental metadata, normalization of data and standardization of file formats present challenges that are computationally intensive. This calls for high-throughput pipelines expressly designed to handle the combinatorial and longitudinal nature of synthetic biology. In this paper, we present a pipeline to maximize the analytical reproducibility of RNASeq for synthetic biologists. We also explore the impact of reproducibility on the validation of machine learning models. We present the design of a pipeline that combines traditional RNASeq data processing tools with structured metadata tracking to allow for the exploration of the combinatorial design in a high-throughput and reproducible manner. We then demonstrate utility via two different experiments: a control comparison experiment and a machine learning model experiment. The first experiment compares datasets collected from identical biological controls across multiple days for two different organisms. It shows that a reproducible experimental protocol for one organism does not guarantee reproducibility in another. The second experiment quantifies the differences in experimental runs from multiple perspectives. It shows that the lack of reproducibility from these different perspectives can place an upper bound on the validation of machine learning models trained on RNASeq data. Graphical Abstract.</p>","PeriodicalId":74902,"journal":{"name":"Synthetic biology (Oxford, England)","volume":" ","pages":"ysac012"},"PeriodicalIF":2.6000,"publicationDate":"2022-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/4f/a8/ysac012.PMC9408027.pdf","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Synthetic biology (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/synbio/ysac012","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 1

Abstract

Sequencing technologies, in particular RNASeq, have become critical tools in the design, build, test and learn cycle of synthetic biology. They provide a better understanding of synthetic designs, and they help identify ways to improve and select designs. While these data are beneficial to design, their collection and analysis is a complex, multistep process that has implications on both discovery and reproducibility of experiments. Additionally, tool parameters, experimental metadata, normalization of data and standardization of file formats present challenges that are computationally intensive. This calls for high-throughput pipelines expressly designed to handle the combinatorial and longitudinal nature of synthetic biology. In this paper, we present a pipeline to maximize the analytical reproducibility of RNASeq for synthetic biologists. We also explore the impact of reproducibility on the validation of machine learning models. We present the design of a pipeline that combines traditional RNASeq data processing tools with structured metadata tracking to allow for the exploration of the combinatorial design in a high-throughput and reproducible manner. We then demonstrate utility via two different experiments: a control comparison experiment and a machine learning model experiment. The first experiment compares datasets collected from identical biological controls across multiple days for two different organisms. It shows that a reproducible experimental protocol for one organism does not guarantee reproducibility in another. The second experiment quantifies the differences in experimental runs from multiple perspectives. It shows that the lack of reproducibility from these different perspectives can place an upper bound on the validation of machine learning models trained on RNASeq data. Graphical Abstract.

Abstract Image

Abstract Image

Abstract Image

为合成生物学家提高RNASeq分析可重复性的工具包。
测序技术,特别是RNASeq,已经成为合成生物学设计、构建、测试和学习周期的关键工具。它们提供了对合成设计的更好理解,并帮助确定改进和选择设计的方法。虽然这些数据有利于设计,但它们的收集和分析是一个复杂的、多步骤的过程,对实验的发现和可重复性都有影响。此外,工具参数、实验元数据、数据规范化和文件格式标准化都是计算密集型的挑战。这需要高通量的管道,专门设计来处理合成生物学的组合和纵向性质。在本文中,我们提出了一个管道,以最大限度地提高RNASeq的分析可重复性合成生物学家。我们还探讨了可重复性对机器学习模型验证的影响。我们提出了一种管道的设计,该管道将传统的RNASeq数据处理工具与结构化元数据跟踪相结合,以允许以高通量和可重复的方式探索组合设计。然后,我们通过两个不同的实验来证明实用性:一个控制比较实验和一个机器学习模型实验。第一个实验比较了从两种不同生物的相同生物对照中收集的数据集。这表明一种生物的可重复性实验方案并不保证在另一种生物中的可重复性。第二个实验从多个角度量化了实验运行的差异。这表明,从这些不同的角度来看,缺乏可重复性可以为在RNASeq数据上训练的机器学习模型的验证设置上限。图形抽象。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信