SWAP-Assembler 2: Optimization of De Novo Genome Assembler at Extreme Scale

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI:10.1109/ICPP.2016.29

Jintao Meng, Sangmin Seo, P. Balaji, Yanjie Wei, Bingqiang Wang, Shengzhong Feng

{"title":"SWAP-Assembler 2: Optimization of De Novo Genome Assembler at Extreme Scale","authors":"Jintao Meng, Sangmin Seo, P. Balaji, Yanjie Wei, Bingqiang Wang, Shengzhong Feng","doi":"10.1109/ICPP.2016.29","DOIUrl":null,"url":null,"abstract":"In this paper, we analyze and optimize the most time-consuming steps of the SWAP-Assembler, a parallel genome assembler, so that it can scale to a large number of cores for huge genomes with sequencing data ranging from terabyes to petabytes. Performance analysis results show that the most time-consuming steps are input parallelization, k-mer graph construction, and graph simplification (edge merging). For the input parallelization, the input data is divided into virtual fragments with nearly equal size, and the start position and end position of each fragment are automatically separated at the beginning of the reads. In k-mer graph construction, in order to improve the communication efficiency, the message size is kept constant between any two processes by proportionally increasing the number of nucleotides to the number of processes in the input parallelization step for each round. The memory usage is also decreased because only a small part of the input data is processed in each round. With graph simplification, the communication protocol reduces the number of communication loops from four to two loops and decreases the idle communication time. The optimized assembler is denoted SWAP-Assembler 2 (SWAP2). In our experiments using a 1000 Genomes project dataset of 4 terabytes (the largest dataset ever used for assembling) on the supercomputer Mira, the results show that SWAP2 scales to 131,072 cores with an efficiency of 40%. We also compared our work with both the HipMer assembler and the SWAP-Assembler. On the Yanhuang dataset of 300 gigabytes, SWAP2 shows a 3X speedup and 4X better scalability compared with the HipMer assembler and is 45 times faster than the SWAP-Assembler. The SWAP2 software is available at https://sourceforge.net/projects/swapassembler.","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":"59 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 45th International Conference on Parallel Processing (ICPP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2016.29","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 17

Abstract

In this paper, we analyze and optimize the most time-consuming steps of the SWAP-Assembler, a parallel genome assembler, so that it can scale to a large number of cores for huge genomes with sequencing data ranging from terabyes to petabytes. Performance analysis results show that the most time-consuming steps are input parallelization, k-mer graph construction, and graph simplification (edge merging). For the input parallelization, the input data is divided into virtual fragments with nearly equal size, and the start position and end position of each fragment are automatically separated at the beginning of the reads. In k-mer graph construction, in order to improve the communication efficiency, the message size is kept constant between any two processes by proportionally increasing the number of nucleotides to the number of processes in the input parallelization step for each round. The memory usage is also decreased because only a small part of the input data is processed in each round. With graph simplification, the communication protocol reduces the number of communication loops from four to two loops and decreases the idle communication time. The optimized assembler is denoted SWAP-Assembler 2 (SWAP2). In our experiments using a 1000 Genomes project dataset of 4 terabytes (the largest dataset ever used for assembling) on the supercomputer Mira, the results show that SWAP2 scales to 131,072 cores with an efficiency of 40%. We also compared our work with both the HipMer assembler and the SWAP-Assembler. On the Yanhuang dataset of 300 gigabytes, SWAP2 shows a 3X speedup and 4X better scalability compared with the HipMer assembler and is 45 times faster than the SWAP-Assembler. The SWAP2 software is available at https://sourceforge.net/projects/swapassembler.

查看原文本刊更多论文

SWAP-Assembler 2: De Novo基因组组装器在极端规模下的优化

在本文中，我们分析和优化了SWAP-Assembler(一个并行基因组组装器)最耗时的步骤，使其能够扩展到大量的核心，用于测序数据范围从太字节到pb的巨大基因组。性能分析结果表明，最耗时的步骤是输入并行化、k-mer图构建和图简化(边缘合并)。在输入并行化方面，将输入数据分割成大小几乎相等的虚拟片段，并在读取开始时自动分离每个片段的起始位置和结束位置。在k-mer图构建中，为了提高通信效率，通过将每轮输入并行化步骤中核苷酸的数量与进程的数量成比例地增加，使任意两个进程之间的消息大小保持不变。内存使用也减少了，因为每轮只处理一小部分输入数据。该通信协议通过图形化简，将通信环路从4个减少到2个，减少了空闲通信时间。优化后的汇编程序记为SWAP-Assembler 2 (SWAP2)。在我们的实验中，在超级计算机Mira上使用了4tb的1000个基因组项目数据集(有史以来用于组装的最大数据集)，结果表明SWAP2扩展到131,072个核，效率为40%。我们还将我们的工作与HipMer汇编器和swap汇编器进行了比较。在300 gb的Yanhuang数据集上，SWAP2的速度比HipMer汇编器提高了3倍，可扩展性提高了4倍，比SWAP-Assembler快45倍。SWAP2软件可从https://sourceforge.net/projects/swapassembler获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 45th International Conference on Parallel Processing (ICPP)

自引率

0.00%

发文量