基因组信息学的加速并行动态规划

S. Natarajan, N. Krishnakumar, M. Pavan, D. Pal, S. Nandy
{"title":"基因组信息学的加速并行动态规划","authors":"S. Natarajan, N. Krishnakumar, M. Pavan, D. Pal, S. Nandy","doi":"10.1109/CONECCT.2018.8482378","DOIUrl":null,"url":null,"abstract":"Parsing a very long genomic string (human genome is typically 3 billion characters long) abstracts the whole complexity of biocomputing. Approximate String Matching (ASM) is the most eligible computing paradigm that captures the biological complexity of the genome, integrating various sources of biological information into tractable probabilistic models. Though computationally complex, the Dynamic Programming (DP) methodology proves to be very efficient for ASM, in discriminating substantial similarities amongst severe noise in genetic data presented by evolution. Though a significant amount of computations in the DP algorithms are accelerated on multiple platforms, the less complex traceback step is still performed in the host, presenting significant memory and Input/Output bottleneck. With billions of such alignments required to analyse the genomic big data from the Next Generation Sequencing (NGS) Platforms, this bottleneck can severely affect system performance. This paper presents ReneGENE-DP, our implementations of the DP computations on hardware accelerators, with the novelty of realizing traceback in hardware in parallel with the forward scan during analysis, on both FPGA and GPU. The fastest FPGA implementation is around 43.63x better than the fastest GPU implementation of ReneGENE-DP, which in turn, is 380.85x faster than the reference design, which is a GPU based DP algorithm with traceback on host.","PeriodicalId":430389,"journal":{"name":"2018 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"ReneGENE-DP: Accelerated Parallel Dynamic Programming for Genome Informatics\",\"authors\":\"S. Natarajan, N. Krishnakumar, M. Pavan, D. Pal, S. Nandy\",\"doi\":\"10.1109/CONECCT.2018.8482378\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Parsing a very long genomic string (human genome is typically 3 billion characters long) abstracts the whole complexity of biocomputing. Approximate String Matching (ASM) is the most eligible computing paradigm that captures the biological complexity of the genome, integrating various sources of biological information into tractable probabilistic models. Though computationally complex, the Dynamic Programming (DP) methodology proves to be very efficient for ASM, in discriminating substantial similarities amongst severe noise in genetic data presented by evolution. Though a significant amount of computations in the DP algorithms are accelerated on multiple platforms, the less complex traceback step is still performed in the host, presenting significant memory and Input/Output bottleneck. With billions of such alignments required to analyse the genomic big data from the Next Generation Sequencing (NGS) Platforms, this bottleneck can severely affect system performance. This paper presents ReneGENE-DP, our implementations of the DP computations on hardware accelerators, with the novelty of realizing traceback in hardware in parallel with the forward scan during analysis, on both FPGA and GPU. The fastest FPGA implementation is around 43.63x better than the fastest GPU implementation of ReneGENE-DP, which in turn, is 380.85x faster than the reference design, which is a GPU based DP algorithm with traceback on host.\",\"PeriodicalId\":430389,\"journal\":{\"name\":\"2018 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)\",\"volume\":\"2 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CONECCT.2018.8482378\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE International Conference on Electronics, Computing and Communication Technologies (CONECCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CONECCT.2018.8482378","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

解析一个非常长的基因组字符串(人类基因组通常有30亿个字符长)抽象了整个生物计算的复杂性。近似字符串匹配(ASM)是捕获基因组生物复杂性的最合适的计算范式,将各种来源的生物信息集成到可处理的概率模型中。尽管计算复杂,动态规划(DP)方法被证明是非常有效的ASM,在严重的遗传数据的进化噪声中区分大量的相似性。虽然DP算法中的大量计算在多个平台上得到了加速,但不太复杂的回溯步骤仍然在主机上执行,存在显著的内存和输入/输出瓶颈。由于分析下一代测序(NGS)平台的基因组大数据需要数十亿次这样的比对,这一瓶颈可能严重影响系统性能。本文介绍了ReneGENE-DP,我们在硬件加速器上实现的DP计算,其新颖之处在于在FPGA和GPU上实现硬件回溯与分析时的前向扫描并行。最快的FPGA实现比最快的GPU实现ReneGENE-DP快43.63倍左右,而后者又比参考设计快380.85倍,参考设计是基于GPU的DP算法,在主机上进行回溯。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
ReneGENE-DP: Accelerated Parallel Dynamic Programming for Genome Informatics
Parsing a very long genomic string (human genome is typically 3 billion characters long) abstracts the whole complexity of biocomputing. Approximate String Matching (ASM) is the most eligible computing paradigm that captures the biological complexity of the genome, integrating various sources of biological information into tractable probabilistic models. Though computationally complex, the Dynamic Programming (DP) methodology proves to be very efficient for ASM, in discriminating substantial similarities amongst severe noise in genetic data presented by evolution. Though a significant amount of computations in the DP algorithms are accelerated on multiple platforms, the less complex traceback step is still performed in the host, presenting significant memory and Input/Output bottleneck. With billions of such alignments required to analyse the genomic big data from the Next Generation Sequencing (NGS) Platforms, this bottleneck can severely affect system performance. This paper presents ReneGENE-DP, our implementations of the DP computations on hardware accelerators, with the novelty of realizing traceback in hardware in parallel with the forward scan during analysis, on both FPGA and GPU. The fastest FPGA implementation is around 43.63x better than the fastest GPU implementation of ReneGENE-DP, which in turn, is 380.85x faster than the reference design, which is a GPU based DP algorithm with traceback on host.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信