基因组信息学的加速并行动态规划

2018 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT) Pub Date : 2018-03-01 DOI:10.1109/CONECCT.2018.8482378

S. Natarajan, N. Krishnakumar, M. Pavan, D. Pal, S. Nandy

{"title":"基因组信息学的加速并行动态规划","authors":"S. Natarajan, N. Krishnakumar, M. Pavan, D. Pal, S. Nandy","doi":"10.1109/CONECCT.2018.8482378","DOIUrl":null,"url":null,"abstract":"Parsing a very long genomic string (human genome is typically 3 billion characters long) abstracts the whole complexity of biocomputing. Approximate String Matching (ASM) is the most eligible computing paradigm that captures the biological complexity of the genome, integrating various sources of biological information into tractable probabilistic models. Though computationally complex, the Dynamic Programming (DP) methodology proves to be very efficient for ASM, in discriminating substantial similarities amongst severe noise in genetic data presented by evolution. Though a significant amount of computations in the DP algorithms are accelerated on multiple platforms, the less complex traceback step is still performed in the host, presenting significant memory and Input/Output bottleneck. With billions of such alignments required to analyse the genomic big data from the Next Generation Sequencing (NGS) Platforms, this bottleneck can severely affect system performance. This paper presents ReneGENE-DP, our implementations of the DP computations on hardware accelerators, with the novelty of realizing traceback in hardware in parallel with the forward scan during analysis, on both FPGA and GPU. The fastest FPGA implementation is around 43.63x better than the fastest GPU implementation of ReneGENE-DP, which in turn, is 380.85x faster than the reference design, which is a GPU based DP algorithm with traceback on host.","PeriodicalId":430389,"journal":{"name":"2018 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"ReneGENE-DP: Accelerated Parallel Dynamic Programming for Genome Informatics\",\"authors\":\"S. Natarajan, N. Krishnakumar, M. Pavan, D. Pal, S. Nandy\",\"doi\":\"10.1109/CONECCT.2018.8482378\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Parsing a very long genomic string (human genome is typically 3 billion characters long) abstracts the whole complexity of biocomputing. Approximate String Matching (ASM) is the most eligible computing paradigm that captures the biological complexity of the genome, integrating various sources of biological information into tractable probabilistic models. Though computationally complex, the Dynamic Programming (DP) methodology proves to be very efficient for ASM, in discriminating substantial similarities amongst severe noise in genetic data presented by evolution. Though a significant amount of computations in the DP algorithms are accelerated on multiple platforms, the less complex traceback step is still performed in the host, presenting significant memory and Input/Output bottleneck. With billions of such alignments required to analyse the genomic big data from the Next Generation Sequencing (NGS) Platforms, this bottleneck can severely affect system performance. This paper presents ReneGENE-DP, our implementations of the DP computations on hardware accelerators, with the novelty of realizing traceback in hardware in parallel with the forward scan during analysis, on both FPGA and GPU. The fastest FPGA implementation is around 43.63x better than the fastest GPU implementation of ReneGENE-DP, which in turn, is 380.85x faster than the reference design, which is a GPU based DP algorithm with traceback on host.\",\"PeriodicalId\":430389,\"journal\":{\"name\":\"2018 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CONECCT.2018.8482378\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONECCT.2018.8482378","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

解析一个非常长的基因组字符串(人类基因组通常有30亿个字符长)抽象了整个生物计算的复杂性。近似字符串匹配(ASM)是捕获基因组生物复杂性的最合适的计算范式，将各种来源的生物信息集成到可处理的概率模型中。尽管计算复杂，动态规划(DP)方法被证明是非常有效的ASM，在严重的遗传数据的进化噪声中区分大量的相似性。虽然DP算法中的大量计算在多个平台上得到了加速，但不太复杂的回溯步骤仍然在主机上执行，存在显著的内存和输入/输出瓶颈。由于分析下一代测序(NGS)平台的基因组大数据需要数十亿次这样的比对，这一瓶颈可能严重影响系统性能。本文介绍了ReneGENE-DP，我们在硬件加速器上实现的DP计算，其新颖之处在于在FPGA和GPU上实现硬件回溯与分析时的前向扫描并行。最快的FPGA实现比最快的GPU实现ReneGENE-DP快43.63倍左右，而后者又比参考设计快380.85倍，参考设计是基于GPU的DP算法，在主机上进行回溯。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ReneGENE-DP: Accelerated Parallel Dynamic Programming for Genome Informatics

Parsing a very long genomic string (human genome is typically 3 billion characters long) abstracts the whole complexity of biocomputing. Approximate String Matching (ASM) is the most eligible computing paradigm that captures the biological complexity of the genome, integrating various sources of biological information into tractable probabilistic models. Though computationally complex, the Dynamic Programming (DP) methodology proves to be very efficient for ASM, in discriminating substantial similarities amongst severe noise in genetic data presented by evolution. Though a significant amount of computations in the DP algorithms are accelerated on multiple platforms, the less complex traceback step is still performed in the host, presenting significant memory and Input/Output bottleneck. With billions of such alignments required to analyse the genomic big data from the Next Generation Sequencing (NGS) Platforms, this bottleneck can severely affect system performance. This paper presents ReneGENE-DP, our implementations of the DP computations on hardware accelerators, with the novelty of realizing traceback in hardware in parallel with the forward scan during analysis, on both FPGA and GPU. The fastest FPGA implementation is around 43.63x better than the fastest GPU implementation of ReneGENE-DP, which in turn, is 380.85x faster than the reference design, which is a GPU based DP algorithm with traceback on host.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)

自引率

0.00%

发文量