BWA-MEM-SCALE: Accelerating Genome Sequence Mapping on Commodity Servers

Changdae Kim, Kwangwon Koh, Taehoon Kim, Daegyu Han, Jiwon Seo
{"title":"BWA-MEM-SCALE: Accelerating Genome Sequence Mapping on Commodity Servers","authors":"Changdae Kim, Kwangwon Koh, Taehoon Kim, Daegyu Han, Jiwon Seo","doi":"10.1145/3545008.3545033","DOIUrl":null,"url":null,"abstract":"As advances in Next-Generation Sequencing have made genome sequence data generation faster and cheaper, the acceleration of genome sequence mapping to the reference genome becomes an increasingly important problem. Much effort has been made to improve the performance of the sequence mapping process. In this paper, we propose BWA-MEM-SCALE which offers software-based acceleration techniques that fully utilize system resources to speed up genome sequence mapping. BWA-MEM-SCALE has two optimization mechanisms that exploit the system memory resource; Exact Match Filter (EMF) finds the input reads that match in full-length to the reference genome so that the expensive mapping process is bypassed for those reads. FM-index Accelerator (FMA) skips the prefix of sequences in seed matching with pre-assembled data. Moreover, we fully utilize the CPU cores in the system by carefully pipelining the mapping process and using in-memory index store. We implement the proposed mechanisms on BWA-MEM2 which is the state-of-the-art sequence mapping software. The evaluation shows that BWA-MEM-SCALE achieves substantial speedup compared to BWA-MEM2 when the system has a sufficient amount of resources. For example, with additional 104GB of memory, BWA-MEM-SCALE gives up to 3.32X speedup over BWA-MEM2. Because we support partially deploying the acceleration techniques, BWA-MEM-SCALE speeds up the mapping performance in proportion to the available system resource. Source-code: https://github.com/etri/bwa-mem-scale","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"150 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

As advances in Next-Generation Sequencing have made genome sequence data generation faster and cheaper, the acceleration of genome sequence mapping to the reference genome becomes an increasingly important problem. Much effort has been made to improve the performance of the sequence mapping process. In this paper, we propose BWA-MEM-SCALE which offers software-based acceleration techniques that fully utilize system resources to speed up genome sequence mapping. BWA-MEM-SCALE has two optimization mechanisms that exploit the system memory resource; Exact Match Filter (EMF) finds the input reads that match in full-length to the reference genome so that the expensive mapping process is bypassed for those reads. FM-index Accelerator (FMA) skips the prefix of sequences in seed matching with pre-assembled data. Moreover, we fully utilize the CPU cores in the system by carefully pipelining the mapping process and using in-memory index store. We implement the proposed mechanisms on BWA-MEM2 which is the state-of-the-art sequence mapping software. The evaluation shows that BWA-MEM-SCALE achieves substantial speedup compared to BWA-MEM2 when the system has a sufficient amount of resources. For example, with additional 104GB of memory, BWA-MEM-SCALE gives up to 3.32X speedup over BWA-MEM2. Because we support partially deploying the acceleration techniques, BWA-MEM-SCALE speeds up the mapping performance in proportion to the available system resource. Source-code: https://github.com/etri/bwa-mem-scale
bwa - mems - scale:在商用服务器上加速基因组序列定位
随着新一代测序技术的进步,基因组序列数据的生成速度更快,成本更低,加速基因组序列定位到参考基因组成为一个越来越重要的问题。为了提高序列映射过程的性能,已经做了很多努力。在本文中,我们提出了bwa - memm - scale,它提供了基于软件的加速技术,充分利用系统资源来加速基因组序列定位。bwa - mems - scale有两种优化机制,利用系统内存资源;精确匹配过滤器(EMF)查找与参考基因组全长匹配的输入读取,从而绕过昂贵的映射过程。FM-index Accelerator (FMA)在与预组装数据进行种子匹配时跳过序列的前缀。此外,我们通过精心地流水线化映射过程和使用内存索引存储,充分利用了系统中的CPU内核。我们在最先进的序列映射软件BWA-MEM2上实现了所提出的机制。评估结果表明,当系统拥有足够的资源时,bwa - memm - scale比BWA-MEM2获得了显著的加速。例如,使用额外的104GB内存,bwa - mems - scale比BWA-MEM2提供高达3.32倍的加速。由于我们支持部分部署加速技术,bwa - mems - scale可以根据可用系统资源的比例加快映射性能。源代码:https://github.com/etri/bwa-mem-scale
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信