加速gpu上的BWA-MEM读映射。

ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing Pub Date : 2023-06-01 DOI:10.1145/3577193.3593703

Minh Pham, Yicheng Tu, Xiaoyi Lv

{"title":"加速gpu上的BWA-MEM读映射。","authors":"Minh Pham, Yicheng Tu, Xiaoyi Lv","doi":"10.1145/3577193.3593703","DOIUrl":null,"url":null,"abstract":"Advancements in Next-Generation Sequencing (NGS) have significantly reduced the cost of generating DNA sequence data and increased the speed of data production. However, such high-throughput data production has increased the need for efficient data analysis programs. One of the most computationally demanding steps in analyzing sequencing data is mapping short reads produced by NGS to a reference DNA sequence, such as a human genome. The mapping program BWA-MEM and its newer version BWA-MEM2, optimized for CPUs, are some of the most popular choices for this task. In this study, we discuss the implementation of BWA-MEM on GPUs. This is a challenging task because many algorithms and data structures in BWA-MEM do not execute efficiently on the GPU architecture. This paper identifies major challenges in developing efficient GPU code on all major stages of the BWA-MEM program, including seeding, seed chaining, Smith-Waterman alignment, memory management, and I/O handling. We conduct comparison experiments against BWA-MEM and BWA-MEM2 running on a 64-thread CPU. The results show that our implementation achieved up to 3.2x speedup over BWA-MEM2 and up to 5.8x over BWA-MEM when using an NVIDIA A40. Using an NVIDIA A6000 and an NVIDIA A100, we achieved a wall-time speedup of up to 3.4x/3.8x over BWA-MEM2 and up to 6.1x/6.8x over BWA-MEM, respectively. In stage-wise comparison, the A40/A6000/A100 GPUs respectively achieved up to 3.7/3.8/4x, 2/2.3/2.5x, and 3.1/5/7.9x speedup on the three major stages of BWA-MEM: seeding and seed chaining, Smith-Waterman, and making SAM output. To the best of our knowledge, this is the first study that attempts to implement the entire BWA-MEM program on GPUs.","PeriodicalId":73273,"journal":{"name":"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing","volume":"2023 ","pages":"155-166"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10425913/pdf/nihms-1917800.pdf","citationCount":"0","resultStr":"{\"title\":\"Accelerating BWA-MEM Read Mapping on GPUs.\",\"authors\":\"Minh Pham, Yicheng Tu, Xiaoyi Lv\",\"doi\":\"10.1145/3577193.3593703\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Advancements in Next-Generation Sequencing (NGS) have significantly reduced the cost of generating DNA sequence data and increased the speed of data production. However, such high-throughput data production has increased the need for efficient data analysis programs. One of the most computationally demanding steps in analyzing sequencing data is mapping short reads produced by NGS to a reference DNA sequence, such as a human genome. The mapping program BWA-MEM and its newer version BWA-MEM2, optimized for CPUs, are some of the most popular choices for this task. In this study, we discuss the implementation of BWA-MEM on GPUs. This is a challenging task because many algorithms and data structures in BWA-MEM do not execute efficiently on the GPU architecture. This paper identifies major challenges in developing efficient GPU code on all major stages of the BWA-MEM program, including seeding, seed chaining, Smith-Waterman alignment, memory management, and I/O handling. We conduct comparison experiments against BWA-MEM and BWA-MEM2 running on a 64-thread CPU. The results show that our implementation achieved up to 3.2x speedup over BWA-MEM2 and up to 5.8x over BWA-MEM when using an NVIDIA A40. Using an NVIDIA A6000 and an NVIDIA A100, we achieved a wall-time speedup of up to 3.4x/3.8x over BWA-MEM2 and up to 6.1x/6.8x over BWA-MEM, respectively. In stage-wise comparison, the A40/A6000/A100 GPUs respectively achieved up to 3.7/3.8/4x, 2/2.3/2.5x, and 3.1/5/7.9x speedup on the three major stages of BWA-MEM: seeding and seed chaining, Smith-Waterman, and making SAM output. To the best of our knowledge, this is the first study that attempts to implement the entire BWA-MEM program on GPUs.\",\"PeriodicalId\":73273,\"journal\":{\"name\":\"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing\",\"volume\":\"2023 \",\"pages\":\"155-166\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10425913/pdf/nihms-1917800.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3577193.3593703\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3577193.3593703","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

新一代测序技术(NGS)的进步大大降低了生成DNA序列数据的成本，提高了数据生成的速度。然而，这种高吞吐量的数据生产增加了对高效数据分析程序的需求。在分析测序数据中，最需要计算量的步骤之一是将NGS产生的短读段映射到参考DNA序列，如人类基因组。映射程序BWA-MEM及其更新版本的BWA-MEM2针对cpu进行了优化，是完成此任务的一些最受欢迎的选择。在本研究中，我们讨论了BWA-MEM在gpu上的实现。这是一项具有挑战性的任务，因为BWA-MEM中的许多算法和数据结构在GPU架构上不能有效地执行。本文确定了在BWA-MEM程序的所有主要阶段开发高效GPU代码的主要挑战，包括播种，种子链，Smith-Waterman对齐，内存管理和I/O处理。我们对运行在64线程CPU上的bwa - memm和BWA-MEM2进行了对比实验。结果表明，当使用NVIDIA A40时，我们的实现比BWA-MEM2实现了高达3.2倍的加速，比BWA-MEM实现了高达5.8倍的加速。使用NVIDIA A6000和NVIDIA A100，我们分别实现了比BWA-MEM2高3.4倍/3.8倍和比BWA-MEM高6.1倍/6.8倍的全时加速。分阶段比较，A40/A6000/A100 gpu在BWA-MEM的播种和种子链、Smith-Waterman和SAM输出三个主要阶段分别实现了3.7/3.8/4倍、2/2.3/2.5倍和3.1/5/7.9倍的加速。据我们所知，这是第一个试图在gpu上实现整个BWA-MEM程序的研究。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Accelerating BWA-MEM Read Mapping on GPUs.

Advancements in Next-Generation Sequencing (NGS) have significantly reduced the cost of generating DNA sequence data and increased the speed of data production. However, such high-throughput data production has increased the need for efficient data analysis programs. One of the most computationally demanding steps in analyzing sequencing data is mapping short reads produced by NGS to a reference DNA sequence, such as a human genome. The mapping program BWA-MEM and its newer version BWA-MEM2, optimized for CPUs, are some of the most popular choices for this task. In this study, we discuss the implementation of BWA-MEM on GPUs. This is a challenging task because many algorithms and data structures in BWA-MEM do not execute efficiently on the GPU architecture. This paper identifies major challenges in developing efficient GPU code on all major stages of the BWA-MEM program, including seeding, seed chaining, Smith-Waterman alignment, memory management, and I/O handling. We conduct comparison experiments against BWA-MEM and BWA-MEM2 running on a 64-thread CPU. The results show that our implementation achieved up to 3.2x speedup over BWA-MEM2 and up to 5.8x over BWA-MEM when using an NVIDIA A40. Using an NVIDIA A6000 and an NVIDIA A100, we achieved a wall-time speedup of up to 3.4x/3.8x over BWA-MEM2 and up to 6.1x/6.8x over BWA-MEM, respectively. In stage-wise comparison, the A40/A6000/A100 GPUs respectively achieved up to 3.7/3.8/4x, 2/2.3/2.5x, and 3.1/5/7.9x speedup on the three major stages of BWA-MEM: seeding and seed chaining, Smith-Waterman, and making SAM output. To the best of our knowledge, this is the first study that attempts to implement the entire BWA-MEM program on GPUs.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ICS ... : proceedings of the ... ACM International Conference on Supercomputing. International Conference on Supercomputing

自引率

0.00%

发文量