Gigabyte-scale alignment acceleration of biological sequences via Ethernet streaming

2014 International Conference on Field-Programmable Technology (FPT) Pub Date : 2014-12-01 DOI:10.1109/FPT.2014.7082781

T. Moorthy, S. Gopalakrishnan

{"title":"Gigabyte-scale alignment acceleration of biological sequences via Ethernet streaming","authors":"T. Moorthy, S. Gopalakrishnan","doi":"10.1109/FPT.2014.7082781","DOIUrl":null,"url":null,"abstract":"We describe the design of a PC-to-FPGA data streaming platform that enables hardware acceleration of gigabyte scale input data. Specifically, the acceleration is an FPGA implementation of the Dialign Algorithm, which performs both global and local alignment of query biological sequences against relatively larger reference strands of biological sequences. Earlier implementations of this algorithm could not be scaled to handle gigabyte-length reference sequences, nor megabyte-length query sequences, due to the inherent limitations of available memory and logic on single-FPGA platforms. We solve these issues via the design of an Ethernet channel to stream the reference sequence, and describe the novel use of SATA based Solid State Drives (SSDs) to time multiplex the FPGA logic into handling larger query sequences as well. In doing so, this paper also presents a general method to achieve gigabyte-depth FIFOs on commercially available FPGA development boards. This benefits data-intensive acceleration even outside of the bioinformatics application domain. Through the development of our acceleration logic and careful coupling of the required IO peripherals, we have successfully demonstrated a processing time of 28.61 minutes for a 200 base-pair query-sequence aligned against a 1 GB reference-sequence, a rate that is limited only by SATA 2 SDD write speeds. The present runtime offers a 38× speedup (18.36 hours down to 28.61 minutes) compared to standalone PC based processing.","PeriodicalId":6877,"journal":{"name":"2014 International Conference on Field-Programmable Technology (FPT)","volume":"54 1","pages":"227-230"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 International Conference on Field-Programmable Technology (FPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPT.2014.7082781","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

We describe the design of a PC-to-FPGA data streaming platform that enables hardware acceleration of gigabyte scale input data. Specifically, the acceleration is an FPGA implementation of the Dialign Algorithm, which performs both global and local alignment of query biological sequences against relatively larger reference strands of biological sequences. Earlier implementations of this algorithm could not be scaled to handle gigabyte-length reference sequences, nor megabyte-length query sequences, due to the inherent limitations of available memory and logic on single-FPGA platforms. We solve these issues via the design of an Ethernet channel to stream the reference sequence, and describe the novel use of SATA based Solid State Drives (SSDs) to time multiplex the FPGA logic into handling larger query sequences as well. In doing so, this paper also presents a general method to achieve gigabyte-depth FIFOs on commercially available FPGA development boards. This benefits data-intensive acceleration even outside of the bioinformatics application domain. Through the development of our acceleration logic and careful coupling of the required IO peripherals, we have successfully demonstrated a processing time of 28.61 minutes for a 200 base-pair query-sequence aligned against a 1 GB reference-sequence, a rate that is limited only by SATA 2 SDD write speeds. The present runtime offers a 38× speedup (18.36 hours down to 28.61 minutes) compared to standalone PC based processing.

查看原文本刊更多论文

通过以太网流实现生物序列的千兆级对齐加速

我们描述了一个pc到fpga数据流平台的设计，该平台可以实现千兆字节规模输入数据的硬件加速。具体来说，加速是Dialign算法的FPGA实现，它根据相对较大的生物序列参考链执行查询生物序列的全局和局部对齐。由于单fpga平台上可用内存和逻辑的固有限制，该算法的早期实现无法扩展到处理千兆字节长度的参考序列，也无法处理兆字节长度的查询序列。我们通过设计一个以太网通道来传输参考序列来解决这些问题，并描述了基于SATA的固态硬盘(ssd)的新颖使用，以便将FPGA逻辑时间复用以处理更大的查询序列。在此过程中，本文还提出了在商用FPGA开发板上实现千兆字节深度fifo的一般方法。这有利于数据密集型加速，甚至在生物信息学应用领域之外。通过开发我们的加速逻辑和仔细耦合所需的IO外设，我们已经成功地演示了针对1 GB参考序列的200碱基对查询序列的处理时间为28.61分钟，该速率仅受SATA 2 SDD写入速度的限制。与基于独立PC的处理相比，当前运行时提供了38倍的加速(从18.36小时减少到28.61分钟)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 International Conference on Field-Programmable Technology (FPT)

自引率

0.00%

发文量