Accelerating Genome Sequence Alignment on Hadoop on Lustre Environment

2017 IEEE 13th International Conference on e-Science (e-Science) Pub Date : 2017-10-01 DOI:10.1109/eScience.2017.59

Eun-Kyu Byun, Junehawk Lee, S. Yu, J. Kwak, Soonwook Hwang

引用次数: 1

Abstract

Genome sequence alignment is one of the basic procedure of genome sequencing analysis pipeline and also one of the most time-consuming parts. Including BigBWA, a number of tools were proposed to accelerate genome sequence alignment by parallelizing computation with Hadoop technologies. However, HDFS incurs considerable I/O overhead. In this research, we propose a new sequence alignment tool adopting Hadoop on Lustre. Based on BigBWA, we removed data transfer overhead caused by HDFS and parallelized whole I/O steps. Experimental result shows that our solution is five times faster than original BigBWA in a ten-node Lustre based Hadoop cluster.

查看原文本刊更多论文

在Lustre环境下加速Hadoop基因组序列比对

基因组序列比对是基因组测序分析流水线的基本步骤之一，也是最耗时的环节之一。包括BigBWA在内，许多工具被提出通过Hadoop技术并行计算来加速基因组序列比对。然而，HDFS带来了相当大的I/O开销。在本研究中，我们提出了一种新的基于Hadoop的序列比对工具。基于BigBWA，我们消除了HDFS带来的数据传输开销，并行化了整个I/O步骤。实验结果表明，在基于10节点Lustre的Hadoop集群中，我们的解决方案比原来的BigBWA快5倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 IEEE 13th International Conference on e-Science (e-Science)

自引率

0.00%

发文量