Two-Hit Filter Synthesis for Genomic Database Search

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) Pub Date : 2016-05-01 DOI:10.1109/FCCM.2016.24

Jordan A. Bradshaw, Rasha Karakchi, J. Bakos

{"title":"Two-Hit Filter Synthesis for Genomic Database Search","authors":"Jordan A. Bradshaw, Rasha Karakchi, J. Bakos","doi":"10.1109/FCCM.2016.24","DOIUrl":null,"url":null,"abstract":"Advancements in genomic sequencing technology is causing genomic database growth to outpace Moore's Law. This continues to make genomic database search a difficult problem and a popular target for emerging processing technologies. The de facto software tool for genomic database search is NCBI BLAST, which operates by transforming each database query into a filter that is subsequently applied to the database. This requires a database scan for every query, fundamentally limiting its performance by I/O bandwidth. In this paper we present a functionally-equivalent variation on the NCBI BLAST algorithm that maps more suitably to an FPGA implementation. This variation of the algorithm attempts to reduce the I/O requirement by leveraging FPGA-specific capabilities, such as high pattern matching throughput and explicit on chip memory structure and allocation. Our algorithm transforms the database -- not the query -- into a filter that is stored as a hierarchical arrangement of three tables, the first two of which are stored on chip and the third off chip. Our results show that -- while performance is data dependent -- it is possible to achieve speedups of up to 8X based on the relative reduction in I/O of our approach versus that of NCBI BLAST. More importantly, the performance relative to NCBI BLAST improves with larger databases and query workload sizes.","PeriodicalId":113498,"journal":{"name":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"2020 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2016.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Advancements in genomic sequencing technology is causing genomic database growth to outpace Moore's Law. This continues to make genomic database search a difficult problem and a popular target for emerging processing technologies. The de facto software tool for genomic database search is NCBI BLAST, which operates by transforming each database query into a filter that is subsequently applied to the database. This requires a database scan for every query, fundamentally limiting its performance by I/O bandwidth. In this paper we present a functionally-equivalent variation on the NCBI BLAST algorithm that maps more suitably to an FPGA implementation. This variation of the algorithm attempts to reduce the I/O requirement by leveraging FPGA-specific capabilities, such as high pattern matching throughput and explicit on chip memory structure and allocation. Our algorithm transforms the database -- not the query -- into a filter that is stored as a hierarchical arrangement of three tables, the first two of which are stored on chip and the third off chip. Our results show that -- while performance is data dependent -- it is possible to achieve speedups of up to 8X based on the relative reduction in I/O of our approach versus that of NCBI BLAST. More importantly, the performance relative to NCBI BLAST improves with larger databases and query workload sizes.

查看原文本刊更多论文

基因组数据库搜索的双命中滤波器合成

基因组测序技术的进步使得基因组数据库的增长速度超过了摩尔定律。这继续使基因组数据库搜索成为一个难题，也是新兴处理技术的热门目标。事实上，基因组数据库搜索的软件工具是NCBI BLAST，它通过将每个数据库查询转换为随后应用于数据库的过滤器来操作。这需要对每个查询进行数据库扫描，从根本上限制了I/O带宽的性能。在本文中，我们提出了NCBI BLAST算法的功能等效变体，该变体更适合于FPGA实现。这种算法的变体试图通过利用fpga特定的功能来减少I/O需求，例如高模式匹配吞吐量和明确的芯片内存结构和分配。我们的算法将数据库(而不是查询)转换为一个过滤器，该过滤器存储为三个表的分层排列，其中前两个表存储在芯片上，第三个表存储在芯片外。我们的研究结果表明，虽然性能取决于数据，但与NCBI BLAST相比，基于我们的方法相对减少的I/O，有可能实现高达8倍的加速。更重要的是，相对于NCBI BLAST，更大的数据库和查询工作负载会提高性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

自引率

0.00%

发文量