HPMA:高性能宏基因组比对工具，基于大规模GPU集群

2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) Pub Date : 2015-11-09 DOI:10.1109/BIBM.2015.7359757

I. Savran, J. Rose

{"title":"HPMA:高性能宏基因组比对工具，基于大规模GPU集群","authors":"I. Savran, J. Rose","doi":"10.1109/BIBM.2015.7359757","DOIUrl":null,"url":null,"abstract":"In this paper, we present HPMA, a graphics processing unit (GPU) accelerated meta-genome sequence alignment algorithm for a collection of DNA sequences. This algorithm supports all-to-all pairwise local alignment on NVIDIA GPUs. HPMA builds on an GPU alignment algorithm that we developed earlier with the addition of a filter module. We designed and developed this new kernel function based on the suffix array data structure. The filter module improves performance by identifying a subset of sequences which meet a user-defined similarity threshold and should be considered for alignment. HPMA has the ability to balance the workload between CPU and GPU. HPMA allows us to preprocess massively large metagenomes in a reasonable amount of time in response to increasing speed of NGS sequencers. The performance of HPMA has been evaluated on a cluster of Kepler-based Tesla K20 GPUs using a variety of short DNA sequence datasets. We evaluate HPMA thoroughly with four test datasets. The first two test sets are comprised of 10 simulated datasets where read length varies from 72 to 750 base-pairs. The third test set is designed to allow a comparison with published results for GSWABE, a competing GPU alignment tool. The fourth test set is an actual metagenome of over 2 million sequences with an average length of 270 bp. We utilized a cluster of NVIDIA-K20 GPUs in the Stampede supercomputer at the Texas Advanced Computing Center (Austin, TX, USA). When running on a cluster of 10 NVIDIA K20 GPUs, HPMA is able to align 2 million simulated metagenome sequences of length 300 bp in 160 seconds. In the case of real metagenomic data, HPMA is able to align 2,038,516 sequences with an average length of 270 bp in 60 seconds.","PeriodicalId":186217,"journal":{"name":"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"HPMA: High-performance metagenomic alignment tool, on a large-scale GPU cluster\",\"authors\":\"I. Savran, J. Rose\",\"doi\":\"10.1109/BIBM.2015.7359757\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we present HPMA, a graphics processing unit (GPU) accelerated meta-genome sequence alignment algorithm for a collection of DNA sequences. This algorithm supports all-to-all pairwise local alignment on NVIDIA GPUs. HPMA builds on an GPU alignment algorithm that we developed earlier with the addition of a filter module. We designed and developed this new kernel function based on the suffix array data structure. The filter module improves performance by identifying a subset of sequences which meet a user-defined similarity threshold and should be considered for alignment. HPMA has the ability to balance the workload between CPU and GPU. HPMA allows us to preprocess massively large metagenomes in a reasonable amount of time in response to increasing speed of NGS sequencers. The performance of HPMA has been evaluated on a cluster of Kepler-based Tesla K20 GPUs using a variety of short DNA sequence datasets. We evaluate HPMA thoroughly with four test datasets. The first two test sets are comprised of 10 simulated datasets where read length varies from 72 to 750 base-pairs. The third test set is designed to allow a comparison with published results for GSWABE, a competing GPU alignment tool. The fourth test set is an actual metagenome of over 2 million sequences with an average length of 270 bp. We utilized a cluster of NVIDIA-K20 GPUs in the Stampede supercomputer at the Texas Advanced Computing Center (Austin, TX, USA). When running on a cluster of 10 NVIDIA K20 GPUs, HPMA is able to align 2 million simulated metagenome sequences of length 300 bp in 160 seconds. In the case of real metagenomic data, HPMA is able to align 2,038,516 sequences with an average length of 270 bp in 60 seconds.\",\"PeriodicalId\":186217,\"journal\":{\"name\":\"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-11-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BIBM.2015.7359757\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BIBM.2015.7359757","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在本文中，我们提出了HPMA，一个图形处理单元(GPU)加速元基因组序列比对算法的DNA序列集合。该算法支持NVIDIA gpu上的all-to-all成对局部对齐。HPMA建立在我们之前开发的GPU对齐算法的基础上，增加了一个过滤器模块。我们基于后缀数组数据结构设计并开发了这个新的内核函数。过滤器模块通过识别符合用户定义的相似性阈值并应考虑进行比对的序列子集来提高性能。HPMA具有平衡CPU和GPU工作负载的能力。HPMA使我们能够在合理的时间内预处理大量大型宏基因组，以响应NGS测序仪的速度提高。利用多种短DNA序列数据集，在一组基于kepler的Tesla K20 gpu上对HPMA的性能进行了评估。我们用四个测试数据集全面评估HPMA。前两个测试集由10个模拟数据集组成，读取长度从72到750碱基对不等。第三个测试集的设计目的是允许与GSWABE(一种竞争的GPU对齐工具)发布的结果进行比较。第四个测试集是一个实际的宏基因组，包含200多万个序列，平均长度为270 bp。我们在Texas Advanced Computing Center (Austin, TX, USA)的Stampede超级计算机上使用了一组NVIDIA-K20 gpu。当在10个NVIDIA K20 gpu的集群上运行时，HPMA能够在160秒内对齐200万个长度为300 bp的模拟宏基因组序列。在真正的宏基因组数据中，HPMA能够在60秒内对平均长度为270 bp的2,038,516个序列进行比对。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

HPMA: High-performance metagenomic alignment tool, on a large-scale GPU cluster

In this paper, we present HPMA, a graphics processing unit (GPU) accelerated meta-genome sequence alignment algorithm for a collection of DNA sequences. This algorithm supports all-to-all pairwise local alignment on NVIDIA GPUs. HPMA builds on an GPU alignment algorithm that we developed earlier with the addition of a filter module. We designed and developed this new kernel function based on the suffix array data structure. The filter module improves performance by identifying a subset of sequences which meet a user-defined similarity threshold and should be considered for alignment. HPMA has the ability to balance the workload between CPU and GPU. HPMA allows us to preprocess massively large metagenomes in a reasonable amount of time in response to increasing speed of NGS sequencers. The performance of HPMA has been evaluated on a cluster of Kepler-based Tesla K20 GPUs using a variety of short DNA sequence datasets. We evaluate HPMA thoroughly with four test datasets. The first two test sets are comprised of 10 simulated datasets where read length varies from 72 to 750 base-pairs. The third test set is designed to allow a comparison with published results for GSWABE, a competing GPU alignment tool. The fourth test set is an actual metagenome of over 2 million sequences with an average length of 270 bp. We utilized a cluster of NVIDIA-K20 GPUs in the Stampede supercomputer at the Texas Advanced Computing Center (Austin, TX, USA). When running on a cluster of 10 NVIDIA K20 GPUs, HPMA is able to align 2 million simulated metagenome sequences of length 300 bp in 160 seconds. In the case of real metagenomic data, HPMA is able to align 2,038,516 sequences with an average length of 270 bp in 60 seconds.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2015 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

自引率

0.00%

发文量