Two-Hit Filter Synthesis for Genomic Database Search

Jordan A. Bradshaw, Rasha Karakchi, J. Bakos
{"title":"Two-Hit Filter Synthesis for Genomic Database Search","authors":"Jordan A. Bradshaw, Rasha Karakchi, J. Bakos","doi":"10.1109/FCCM.2016.24","DOIUrl":null,"url":null,"abstract":"Advancements in genomic sequencing technology is causing genomic database growth to outpace Moore's Law. This continues to make genomic database search a difficult problem and a popular target for emerging processing technologies. The de facto software tool for genomic database search is NCBI BLAST, which operates by transforming each database query into a filter that is subsequently applied to the database. This requires a database scan for every query, fundamentally limiting its performance by I/O bandwidth. In this paper we present a functionally-equivalent variation on the NCBI BLAST algorithm that maps more suitably to an FPGA implementation. This variation of the algorithm attempts to reduce the I/O requirement by leveraging FPGA-specific capabilities, such as high pattern matching throughput and explicit on chip memory structure and allocation. Our algorithm transforms the database -- not the query -- into a filter that is stored as a hierarchical arrangement of three tables, the first two of which are stored on chip and the third off chip. Our results show that -- while performance is data dependent -- it is possible to achieve speedups of up to 8X based on the relative reduction in I/O of our approach versus that of NCBI BLAST. More importantly, the performance relative to NCBI BLAST improves with larger databases and query workload sizes.","PeriodicalId":113498,"journal":{"name":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","volume":"2020 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FCCM.2016.24","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Advancements in genomic sequencing technology is causing genomic database growth to outpace Moore's Law. This continues to make genomic database search a difficult problem and a popular target for emerging processing technologies. The de facto software tool for genomic database search is NCBI BLAST, which operates by transforming each database query into a filter that is subsequently applied to the database. This requires a database scan for every query, fundamentally limiting its performance by I/O bandwidth. In this paper we present a functionally-equivalent variation on the NCBI BLAST algorithm that maps more suitably to an FPGA implementation. This variation of the algorithm attempts to reduce the I/O requirement by leveraging FPGA-specific capabilities, such as high pattern matching throughput and explicit on chip memory structure and allocation. Our algorithm transforms the database -- not the query -- into a filter that is stored as a hierarchical arrangement of three tables, the first two of which are stored on chip and the third off chip. Our results show that -- while performance is data dependent -- it is possible to achieve speedups of up to 8X based on the relative reduction in I/O of our approach versus that of NCBI BLAST. More importantly, the performance relative to NCBI BLAST improves with larger databases and query workload sizes.
基因组数据库搜索的双命中滤波器合成
基因组测序技术的进步使得基因组数据库的增长速度超过了摩尔定律。这继续使基因组数据库搜索成为一个难题,也是新兴处理技术的热门目标。事实上,基因组数据库搜索的软件工具是NCBI BLAST,它通过将每个数据库查询转换为随后应用于数据库的过滤器来操作。这需要对每个查询进行数据库扫描,从根本上限制了I/O带宽的性能。在本文中,我们提出了NCBI BLAST算法的功能等效变体,该变体更适合于FPGA实现。这种算法的变体试图通过利用fpga特定的功能来减少I/O需求,例如高模式匹配吞吐量和明确的芯片内存结构和分配。我们的算法将数据库(而不是查询)转换为一个过滤器,该过滤器存储为三个表的分层排列,其中前两个表存储在芯片上,第三个表存储在芯片外。我们的研究结果表明,虽然性能取决于数据,但与NCBI BLAST相比,基于我们的方法相对减少的I/O,有可能实现高达8倍的加速。更重要的是,相对于NCBI BLAST,更大的数据库和查询工作负载会提高性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信