Gigabyte-scale alignment acceleration of biological sequences via Ethernet streaming

T. Moorthy, S. Gopalakrishnan
{"title":"Gigabyte-scale alignment acceleration of biological sequences via Ethernet streaming","authors":"T. Moorthy, S. Gopalakrishnan","doi":"10.1109/FPT.2014.7082781","DOIUrl":null,"url":null,"abstract":"We describe the design of a PC-to-FPGA data streaming platform that enables hardware acceleration of gigabyte scale input data. Specifically, the acceleration is an FPGA implementation of the Dialign Algorithm, which performs both global and local alignment of query biological sequences against relatively larger reference strands of biological sequences. Earlier implementations of this algorithm could not be scaled to handle gigabyte-length reference sequences, nor megabyte-length query sequences, due to the inherent limitations of available memory and logic on single-FPGA platforms. We solve these issues via the design of an Ethernet channel to stream the reference sequence, and describe the novel use of SATA based Solid State Drives (SSDs) to time multiplex the FPGA logic into handling larger query sequences as well. In doing so, this paper also presents a general method to achieve gigabyte-depth FIFOs on commercially available FPGA development boards. This benefits data-intensive acceleration even outside of the bioinformatics application domain. Through the development of our acceleration logic and careful coupling of the required IO peripherals, we have successfully demonstrated a processing time of 28.61 minutes for a 200 base-pair query-sequence aligned against a 1 GB reference-sequence, a rate that is limited only by SATA 2 SDD write speeds. The present runtime offers a 38× speedup (18.36 hours down to 28.61 minutes) compared to standalone PC based processing.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"54 1","pages":"227-230"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Field-Programmable Technology (FPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPT.2014.7082781","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

We describe the design of a PC-to-FPGA data streaming platform that enables hardware acceleration of gigabyte scale input data. Specifically, the acceleration is an FPGA implementation of the Dialign Algorithm, which performs both global and local alignment of query biological sequences against relatively larger reference strands of biological sequences. Earlier implementations of this algorithm could not be scaled to handle gigabyte-length reference sequences, nor megabyte-length query sequences, due to the inherent limitations of available memory and logic on single-FPGA platforms. We solve these issues via the design of an Ethernet channel to stream the reference sequence, and describe the novel use of SATA based Solid State Drives (SSDs) to time multiplex the FPGA logic into handling larger query sequences as well. In doing so, this paper also presents a general method to achieve gigabyte-depth FIFOs on commercially available FPGA development boards. This benefits data-intensive acceleration even outside of the bioinformatics application domain. Through the development of our acceleration logic and careful coupling of the required IO peripherals, we have successfully demonstrated a processing time of 28.61 minutes for a 200 base-pair query-sequence aligned against a 1 GB reference-sequence, a rate that is limited only by SATA 2 SDD write speeds. The present runtime offers a 38× speedup (18.36 hours down to 28.61 minutes) compared to standalone PC based processing.
通过以太网流实现生物序列的千兆级对齐加速
我们描述了一个pc到fpga数据流平台的设计,该平台可以实现千兆字节规模输入数据的硬件加速。具体来说,加速是Dialign算法的FPGA实现,它根据相对较大的生物序列参考链执行查询生物序列的全局和局部对齐。由于单fpga平台上可用内存和逻辑的固有限制,该算法的早期实现无法扩展到处理千兆字节长度的参考序列,也无法处理兆字节长度的查询序列。我们通过设计一个以太网通道来传输参考序列来解决这些问题,并描述了基于SATA的固态硬盘(ssd)的新颖使用,以便将FPGA逻辑时间复用以处理更大的查询序列。在此过程中,本文还提出了在商用FPGA开发板上实现千兆字节深度fifo的一般方法。这有利于数据密集型加速,甚至在生物信息学应用领域之外。通过开发我们的加速逻辑和仔细耦合所需的IO外设,我们已经成功地演示了针对1 GB参考序列的200碱基对查询序列的处理时间为28.61分钟,该速率仅受SATA 2 SDD写入速度的限制。与基于独立PC的处理相比,当前运行时提供了38倍的加速(从18.36小时减少到28.61分钟)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信