ArrOW: Experiencing a Parallel Cloud-Based De Novo Assembler Workflow

Kary A. C. S. Ocaña, Thaylon Guedes, Daniel de Oliveira
{"title":"ArrOW: Experiencing a Parallel Cloud-Based De Novo Assembler Workflow","authors":"Kary A. C. S. Ocaña, Thaylon Guedes, Daniel de Oliveira","doi":"10.1109/IPDPSW.2019.00039","DOIUrl":null,"url":null,"abstract":"Advances in next generation sequencing technologies has resulted in the generation of unprecedented volume of sequence data. DNA segments are combined into a reconstruction of the original genome using computer software called genome assemblers. Therefore, assembly now presents new challenges in terms of data management, query, and analysis due the huge number of read sequences and computing intensive CPU-memory algorithms. This restriction reduces the chances to uniformly cover space for exploring statistics, k-mer, software or eukaryotic genomes assembly. To address these issues, we present ArrOW, a cloud-based de novo Assembly clOud Workflow that explores the potential of provenance analytics and parallel computation provided by scientific workflow management systems as SciCumulus. We evaluate the overall performance of ArrOW using up to 256 cores in the Amazon AWS cloud. ArrOW reaches improvements up to 88.3% executing 1,000 reads of genomics datasets. We also highlight how data provenance analytics improved the efficiency for recovering assembling features of genomes.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2019.00039","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Advances in next generation sequencing technologies has resulted in the generation of unprecedented volume of sequence data. DNA segments are combined into a reconstruction of the original genome using computer software called genome assemblers. Therefore, assembly now presents new challenges in terms of data management, query, and analysis due the huge number of read sequences and computing intensive CPU-memory algorithms. This restriction reduces the chances to uniformly cover space for exploring statistics, k-mer, software or eukaryotic genomes assembly. To address these issues, we present ArrOW, a cloud-based de novo Assembly clOud Workflow that explores the potential of provenance analytics and parallel computation provided by scientific workflow management systems as SciCumulus. We evaluate the overall performance of ArrOW using up to 256 cores in the Amazon AWS cloud. ArrOW reaches improvements up to 88.3% executing 1,000 reads of genomics datasets. We also highlight how data provenance analytics improved the efficiency for recovering assembling features of genomes.
箭头:体验一个并行的基于云的从头组装工作流程
下一代测序技术的进步导致了前所未有的序列数据量的产生。使用称为基因组组装器的计算机软件将DNA片段组合成原始基因组的重建。因此,由于大量的读取序列和计算密集型的cpu内存算法,汇编现在在数据管理、查询和分析方面提出了新的挑战。这种限制减少了统一覆盖空间的机会,以探索统计,k-mer,软件或真核基因组组装。为了解决这些问题,我们提出了ArrOW,这是一个基于云的从头组装云工作流,它探索了由科学工作流管理系统(如SciCumulus)提供的来源分析和并行计算的潜力。我们在亚马逊AWS云中使用多达256个核来评估ArrOW的整体性能。ArrOW在基因组学数据集执行1000次读取时达到了高达88.3%的改进。我们还强调了数据来源分析如何提高恢复基因组组装特征的效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信