{"title":"基于数据并行和并发并行的BWA MEM算法性能改进","authors":"N. Kathiresan, M. Temanni, Rashid J. Al-Ali","doi":"10.1109/PDGC.2014.7030780","DOIUrl":null,"url":null,"abstract":"Burrows-Wheeler Transform (BWT) is the widely used data compression technique in the next-generation sequencing (NGS) analysis. Due to the advancement in the NGS technology, the genome data size was increased rapidly and these higher volumes of genome data need to be processed by empirical parallelism. Generally, these NGS data will be processed by traditional parallel processing approaches like (i) thread parallelization (ii) Data parallelization and (iii) Concurrent parallelization, which are their own performance bottlenecks in, thread scalability, scattering/gathering of data and memory bandwidth limitations respectively. To eliminate these drawbacks, we introduced the hybrid parallelization approach called “data-parallel with concurrent parallelization” to process our genome alignment. We used BWA MEM algorithm for aligning human genome sequence, which are dominated by huge memory intensive operations and the performance is limited due to cache/TLB misses. To eliminate the cache/TLB miss, the genome data is partitioned into multiple pieces (i.e., reducing the read size) using data parallelization and concurrently processing these multiple pieces of genome data within the same cache/memory hierarchy. Hence, the performance of proposed data-parallel with concurrent parallelization is 45% better than traditional parallelization approaches. Additionally, we provided proof of concept to process higher volume of genome data using BWA MEM algorithm on the high-end desktop machines.","PeriodicalId":311953,"journal":{"name":"2014 International Conference on Parallel, Distributed and Grid Computing","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":"{\"title\":\"Performance improvement of BWA MEM algorithm using data-parallel with concurrent parallelization\",\"authors\":\"N. Kathiresan, M. Temanni, Rashid J. Al-Ali\",\"doi\":\"10.1109/PDGC.2014.7030780\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Burrows-Wheeler Transform (BWT) is the widely used data compression technique in the next-generation sequencing (NGS) analysis. Due to the advancement in the NGS technology, the genome data size was increased rapidly and these higher volumes of genome data need to be processed by empirical parallelism. Generally, these NGS data will be processed by traditional parallel processing approaches like (i) thread parallelization (ii) Data parallelization and (iii) Concurrent parallelization, which are their own performance bottlenecks in, thread scalability, scattering/gathering of data and memory bandwidth limitations respectively. To eliminate these drawbacks, we introduced the hybrid parallelization approach called “data-parallel with concurrent parallelization” to process our genome alignment. We used BWA MEM algorithm for aligning human genome sequence, which are dominated by huge memory intensive operations and the performance is limited due to cache/TLB misses. To eliminate the cache/TLB miss, the genome data is partitioned into multiple pieces (i.e., reducing the read size) using data parallelization and concurrently processing these multiple pieces of genome data within the same cache/memory hierarchy. Hence, the performance of proposed data-parallel with concurrent parallelization is 45% better than traditional parallelization approaches. Additionally, we provided proof of concept to process higher volume of genome data using BWA MEM algorithm on the high-end desktop machines.\",\"PeriodicalId\":311953,\"journal\":{\"name\":\"2014 International Conference on Parallel, Distributed and Grid Computing\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"17\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 International Conference on Parallel, Distributed and Grid Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDGC.2014.7030780\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Parallel, Distributed and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDGC.2014.7030780","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance improvement of BWA MEM algorithm using data-parallel with concurrent parallelization
Burrows-Wheeler Transform (BWT) is the widely used data compression technique in the next-generation sequencing (NGS) analysis. Due to the advancement in the NGS technology, the genome data size was increased rapidly and these higher volumes of genome data need to be processed by empirical parallelism. Generally, these NGS data will be processed by traditional parallel processing approaches like (i) thread parallelization (ii) Data parallelization and (iii) Concurrent parallelization, which are their own performance bottlenecks in, thread scalability, scattering/gathering of data and memory bandwidth limitations respectively. To eliminate these drawbacks, we introduced the hybrid parallelization approach called “data-parallel with concurrent parallelization” to process our genome alignment. We used BWA MEM algorithm for aligning human genome sequence, which are dominated by huge memory intensive operations and the performance is limited due to cache/TLB misses. To eliminate the cache/TLB miss, the genome data is partitioned into multiple pieces (i.e., reducing the read size) using data parallelization and concurrently processing these multiple pieces of genome data within the same cache/memory hierarchy. Hence, the performance of proposed data-parallel with concurrent parallelization is 45% better than traditional parallelization approaches. Additionally, we provided proof of concept to process higher volume of genome data using BWA MEM algorithm on the high-end desktop machines.