BunchBloomer: Cost-Effective Bloom Filter Accelerator for Genomics Applications

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL) Pub Date : 2022-08-01 DOI:10.1109/FPL57034.2022.00014

Seongyoung Kang, Tarun Sai Ganesh Nerella, Shashank Uppoor, S. Jun

{"title":"BunchBloomer: Cost-Effective Bloom Filter Accelerator for Genomics Applications","authors":"Seongyoung Kang, Tarun Sai Ganesh Nerella, Shashank Uppoor, S. Jun","doi":"10.1109/FPL57034.2022.00014","DOIUrl":null,"url":null,"abstract":"Bloom filters are a very important tool for many applications including genomics, where they are used as a compact data structure for counting k-mers, represent de Bruijn graphs, and more. Due to their random-access nature coupled with the large size required for genomics, Bloom filters for genomics can easily become bound by the random access performance of off-chip memory. This is especially true for accelerators such as FPGAs and GPUs, which can easily remove the computation overhead of the multiple hash functions. As a result, Bloom filter accelerators have typically focused either on small filters which can fit in fast on-chip memory, or require fast off-chip memory fabric such as Hybrid Memory Cubes. In this work, we present BunchBloomer, which improves the cost-effectiveness of FPGA Bloom filter accelerators by making better use of cheaper, lower-power DDR memory. BunchBloomer uses a multi-layer radix sorter to group table updates into bursts directed to the same 8 KiB memory region, which can be efficiently cached in on-chip memory. A single BunchBloomer device outperforms a costly 12-core server by over 2×, demonstrating an order of magnitude better power efficiency. It even achieves better power efficiency compared to published FPGA Bloom filter accelerators equipped with Hybrid Memory Cubes.","PeriodicalId":380116,"journal":{"name":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FPL57034.2022.00014","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Bloom filters are a very important tool for many applications including genomics, where they are used as a compact data structure for counting k-mers, represent de Bruijn graphs, and more. Due to their random-access nature coupled with the large size required for genomics, Bloom filters for genomics can easily become bound by the random access performance of off-chip memory. This is especially true for accelerators such as FPGAs and GPUs, which can easily remove the computation overhead of the multiple hash functions. As a result, Bloom filter accelerators have typically focused either on small filters which can fit in fast on-chip memory, or require fast off-chip memory fabric such as Hybrid Memory Cubes. In this work, we present BunchBloomer, which improves the cost-effectiveness of FPGA Bloom filter accelerators by making better use of cheaper, lower-power DDR memory. BunchBloomer uses a multi-layer radix sorter to group table updates into bursts directed to the same 8 KiB memory region, which can be efficiently cached in on-chip memory. A single BunchBloomer device outperforms a costly 12-core server by over 2×, demonstrating an order of magnitude better power efficiency. It even achieves better power efficiency compared to published FPGA Bloom filter accelerators equipped with Hybrid Memory Cubes.

查看原文本刊更多论文

BunchBloomer:经济高效的Bloom过滤器加速器，用于基因组学应用

布隆过滤器是包括基因组学在内的许多应用程序的非常重要的工具，它们被用作计数k-mers的紧凑数据结构，表示de Bruijn图等。由于其随机访问特性加上基因组学所需的大尺寸，基因组学的Bloom过滤器很容易受到片外存储器随机访问性能的限制。对于fpga和gpu这样的加速器来说尤其如此，它们可以很容易地消除多个哈希函数的计算开销。因此，Bloom滤波器加速器通常要么专注于可以适应快速片上存储器的小型滤波器，要么需要快速片外存储器结构，如混合存储器立方体。在这项工作中，我们提出了BunchBloomer，它通过更好地利用更便宜，更低功耗的DDR内存来提高FPGA Bloom滤波器加速器的成本效益。BunchBloomer使用多层基数排序器将表更新分组为指向相同8 KiB内存区域的突发，可以有效地缓存在片上内存中。单个BunchBloomer设备的性能比昂贵的12核服务器高出2倍以上，显示出更高的功率效率。与已发布的配备混合内存立方体的FPGA Bloom滤波器加速器相比，它甚至实现了更好的功率效率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 32nd International Conference on Field-Programmable Logic and Applications (FPL)

自引率

0.00%

发文量