在Lustre环境下加速Hadoop基因组序列比对

2017 IEEE 13th International Conference on e-Science (e-Science) Pub Date : 2017-10-01 DOI:10.1109/eScience.2017.59

Eun-Kyu Byun, Junehawk Lee, S. Yu, J. Kwak, Soonwook Hwang

{"title":"在Lustre环境下加速Hadoop基因组序列比对","authors":"Eun-Kyu Byun, Junehawk Lee, S. Yu, J. Kwak, Soonwook Hwang","doi":"10.1109/eScience.2017.59","DOIUrl":null,"url":null,"abstract":"Genome sequence alignment is one of the basic procedure of genome sequencing analysis pipeline and also one of the most time-consuming parts. Including BigBWA, a number of tools were proposed to accelerate genome sequence alignment by parallelizing computation with Hadoop technologies. However, HDFS incurs considerable I/O overhead. In this research, we propose a new sequence alignment tool adopting Hadoop on Lustre. Based on BigBWA, we removed data transfer overhead caused by HDFS and parallelized whole I/O steps. Experimental result shows that our solution is five times faster than original BigBWA in a ten-node Lustre based Hadoop cluster.","PeriodicalId":137652,"journal":{"name":"2017 IEEE 13th International Conference on e-Science (e-Science)","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Accelerating Genome Sequence Alignment on Hadoop on Lustre Environment\",\"authors\":\"Eun-Kyu Byun, Junehawk Lee, S. Yu, J. Kwak, Soonwook Hwang\",\"doi\":\"10.1109/eScience.2017.59\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Genome sequence alignment is one of the basic procedure of genome sequencing analysis pipeline and also one of the most time-consuming parts. Including BigBWA, a number of tools were proposed to accelerate genome sequence alignment by parallelizing computation with Hadoop technologies. However, HDFS incurs considerable I/O overhead. In this research, we propose a new sequence alignment tool adopting Hadoop on Lustre. Based on BigBWA, we removed data transfer overhead caused by HDFS and parallelized whole I/O steps. Experimental result shows that our solution is five times faster than original BigBWA in a ten-node Lustre based Hadoop cluster.\",\"PeriodicalId\":137652,\"journal\":{\"name\":\"2017 IEEE 13th International Conference on e-Science (e-Science)\",\"volume\":\"29 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 IEEE 13th International Conference on e-Science (e-Science)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/eScience.2017.59\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 13th International Conference on e-Science (e-Science)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/eScience.2017.59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

基因组序列比对是基因组测序分析流水线的基本步骤之一，也是最耗时的环节之一。包括BigBWA在内，许多工具被提出通过Hadoop技术并行计算来加速基因组序列比对。然而，HDFS带来了相当大的I/O开销。在本研究中，我们提出了一种新的基于Hadoop的序列比对工具。基于BigBWA，我们消除了HDFS带来的数据传输开销，并行化了整个I/O步骤。实验结果表明，在基于10节点Lustre的Hadoop集群中，我们的解决方案比原来的BigBWA快5倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Accelerating Genome Sequence Alignment on Hadoop on Lustre Environment

Genome sequence alignment is one of the basic procedure of genome sequencing analysis pipeline and also one of the most time-consuming parts. Including BigBWA, a number of tools were proposed to accelerate genome sequence alignment by parallelizing computation with Hadoop technologies. However, HDFS incurs considerable I/O overhead. In this research, we propose a new sequence alignment tool adopting Hadoop on Lustre. Based on BigBWA, we removed data transfer overhead caused by HDFS and parallelized whole I/O steps. Experimental result shows that our solution is five times faster than original BigBWA in a ten-node Lustre based Hadoop cluster.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2017 IEEE 13th International Conference on e-Science (e-Science)

自引率

0.00%

发文量