为下一代测序加速大规模短读段映射(仅摘要)

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays Pub Date : 2014-02-26 DOI:10.1145/2554688.2554707

Chunming Zhang, Wen Tang, Guangming Tan

{"title":"为下一代测序加速大规模短读段映射(仅摘要)","authors":"Chunming Zhang, Wen Tang, Guangming Tan","doi":"10.1145/2554688.2554707","DOIUrl":null,"url":null,"abstract":"Due to the explosion of gene sequencing data with over one billion reads per run, the data-intensive computations of Next Generation Sequencing (NGS) applications pose great challenges to current computing capability. In this paper we investigate both algorithmic and architectural accelerating strategies to a typical NGS analysis algorithm -- short reads mapping -- on a commodity multicore and customizable FPGA coprocessor architecture, respectively. First, we propose a hash buckets reorder algorithm that increases shared cache parallelism during the course of searching hash index. The algorithmic strategy achieves 122Gbp/day throughput by exploiting shared-cache parallelism, that leads to performance improvement of 2 times on an 8-core Intel Xeon processor. Second, we develop a FPGA coprocessor that leverages both bit-level and word-level parallelism with scatter-gather memory mechanism to speedup inherent irregular memory access operations by increasing effective memory bandwidth. Our customized FPGA coprocessor achieves 947Gbp per day throughput, that is 189 times higher than current mapping tools on single CPU core, and above 2 times higher than a 64-core multi-processor system. The coprocessor's power efficiency is 29 times higher than a conventional 64-core multi-processor. The results indicate that the customized FPGA coprocessor architecture, that is configured with scatter-gather memory's word-level access, appeals to data intensive applications.","PeriodicalId":390562,"journal":{"name":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Accelerating massive short reads mapping for next generation sequencing (abstract only)\",\"authors\":\"Chunming Zhang, Wen Tang, Guangming Tan\",\"doi\":\"10.1145/2554688.2554707\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Due to the explosion of gene sequencing data with over one billion reads per run, the data-intensive computations of Next Generation Sequencing (NGS) applications pose great challenges to current computing capability. In this paper we investigate both algorithmic and architectural accelerating strategies to a typical NGS analysis algorithm -- short reads mapping -- on a commodity multicore and customizable FPGA coprocessor architecture, respectively. First, we propose a hash buckets reorder algorithm that increases shared cache parallelism during the course of searching hash index. The algorithmic strategy achieves 122Gbp/day throughput by exploiting shared-cache parallelism, that leads to performance improvement of 2 times on an 8-core Intel Xeon processor. Second, we develop a FPGA coprocessor that leverages both bit-level and word-level parallelism with scatter-gather memory mechanism to speedup inherent irregular memory access operations by increasing effective memory bandwidth. Our customized FPGA coprocessor achieves 947Gbp per day throughput, that is 189 times higher than current mapping tools on single CPU core, and above 2 times higher than a 64-core multi-processor system. The coprocessor's power efficiency is 29 times higher than a conventional 64-core multi-processor. The results indicate that the customized FPGA coprocessor architecture, that is configured with scatter-gather memory's word-level access, appeals to data intensive applications.\",\"PeriodicalId\":390562,\"journal\":{\"name\":\"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays\",\"volume\":\"24 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2554688.2554707\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2554688.2554707","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

由于基因测序数据的爆炸式增长，每次运行读取量超过10亿次，下一代测序(NGS)应用程序的数据密集型计算对当前的计算能力提出了巨大挑战。在本文中，我们分别在商品多核和可定制的FPGA协处理器架构上研究了典型NGS分析算法(短读取映射)的算法和架构加速策略。首先，我们提出了一种哈希桶重排序算法，该算法在搜索哈希索引的过程中增加了共享缓存的并行性。该算法策略通过利用共享缓存并行性实现了122Gbp/天的吞吐量，这使得在8核英特尔至强处理器上的性能提高了2倍。其次，我们开发了一种FPGA协处理器，利用位级和字级并行性以及散射-收集存储器机制，通过增加有效存储器带宽来加速固有的不规则存储器访问操作。我们定制的FPGA协处理器实现了每天947Gbp的吞吐量，是目前单核CPU映射工具的189倍，是64核多处理器系统的2倍以上。协处理器的能效是传统64核多处理器的29倍。结果表明，配置了散集存储器字级访问的定制FPGA协处理器架构适合于数据密集型应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Accelerating massive short reads mapping for next generation sequencing (abstract only)

Due to the explosion of gene sequencing data with over one billion reads per run, the data-intensive computations of Next Generation Sequencing (NGS) applications pose great challenges to current computing capability. In this paper we investigate both algorithmic and architectural accelerating strategies to a typical NGS analysis algorithm -- short reads mapping -- on a commodity multicore and customizable FPGA coprocessor architecture, respectively. First, we propose a hash buckets reorder algorithm that increases shared cache parallelism during the course of searching hash index. The algorithmic strategy achieves 122Gbp/day throughput by exploiting shared-cache parallelism, that leads to performance improvement of 2 times on an 8-core Intel Xeon processor. Second, we develop a FPGA coprocessor that leverages both bit-level and word-level parallelism with scatter-gather memory mechanism to speedup inherent irregular memory access operations by increasing effective memory bandwidth. Our customized FPGA coprocessor achieves 947Gbp per day throughput, that is 189 times higher than current mapping tools on single CPU core, and above 2 times higher than a 64-core multi-processor system. The coprocessor's power efficiency is 29 times higher than a conventional 64-core multi-processor. The results indicate that the customized FPGA coprocessor architecture, that is configured with scatter-gather memory's word-level access, appeals to data intensive applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays

自引率

0.00%

发文量