A Micro-architecture that supports the Fano–Elias encoding and a hardware accelerator for approximate membership queries

IF 2.6 4区计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Microprocessors and Microsystems Pub Date : 2024-01-03 DOI:10.1016/j.micpro.2023.104992

Guy Even, Gabriel Marques Domingues

{"title":"A Micro-architecture that supports the Fano–Elias encoding and a hardware accelerator for approximate membership queries","authors":"Guy Even, Gabriel Marques Domingues","doi":"10.1016/j.micpro.2023.104992","DOIUrl":null,"url":null,"abstract":"<div>We present the first hardware design that supports operations over the Fano–Elias encoding (FE-encoding). Our design is a combinational circuit (i.e., single clock cycle) that supports insertions, deletions, and queries. FE-encoding allows one to store <math><mi>f</mi></math> binary strings, each of length <math><mrow><mi>ℓ</mi><mo>+</mo><mo>log</mo><mi>m</mi></mrow></math> using a string that is <math><mrow><mi>m</mi><mo>+</mo><mi>f</mi><mo>+</mo><mi>f</mi><mi>ℓ</mi></mrow></math> bits long (rather than <math><mrow><mi>f</mi><mrow><mo>(</mo><mi>ℓ</mi><mo>+</mo><mo>log</mo><mi>m</mi><mo>)</mo></mrow></mrow></math>). The asymptotic gate-count of the circuit is <math><mrow><mi>Θ</mi><mrow><mo>(</mo><mrow><mo>(</mo><mi>m</mi><mo>+</mo><mi>f</mi><mo>)</mo></mrow><mi>⋅</mi><mo>lg</mo><mi>m</mi><mo>+</mo><mi>f</mi><mi>⋅</mi><mi>ℓ</mi><mo>)</mo></mrow></mrow></math>. The asymptotic delay is <math><mrow><mi>Θ</mi><mrow><mo>(</mo><mo>lg</mo><mi>m</mi><mo>+</mo><mo>lg</mo><mi>f</mi><mo>+</mo><mo>lg</mo><mi>ℓ</mi><mo>)</mo></mrow></mrow></math>. We implemented our design on an FPGA with four combinations of parameters in which the FE-encoding fits in 512 or 1024 bits.We present the first hardware design for a dynamic filter that maintains a set subject to insertions, deletions, and approximate membership queries. The design contains four main blocks: two memory banks that store FE-encodings and two combinational circuits for FE-encoding. Additional logic deals with double buffering and forwarding.We implemented the dynamic filter on an FPGA with the following parameters: (1) Elements in the dataset are 32-bit strings. (2) The supported dataset can contain up to <math><mrow><msub><mrow><mi>n</mi></mrow><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub><mo>=</mo><mn>45</mn><mi>⋅</mi><msup><mrow><mn>2</mn></mrow><mrow><mn>14</mn></mrow></msup><mo>=</mo><mn>737</mn><mo>,</mo><mn>280</mn></mrow></math> elements. (3) The latency is 2-4 clock cycles. (4) Fixed (i.e., constant and stable) throughput. A new operation can be issued every clock cycle. (5) We prove that the probability of a false-positive error is bounded by <math><mrow><mn>0</mn><mo>.</mo><mn>385</mn><mi>⋅</mi><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mo>−</mo><mn>2</mn></mrow></msup></mrow></math>. (6) We prove that the expected number of insertion failures is less than 1 for every 75 million insertions.Synthesis of our filter on a Xilinx Alveo U250 FPGA achieves a clock rate of 100 MHz (the critical path is due to the memory access). We measure a fixed throughput of 97.7 million operations per second (the loss of 2.3% in the throughput is due to instabilities in the bandwidth of the AXI4 Lite I/O channel).A unique feature of our filter implementation is that the throughput is stable and constant for all benchmarks and loads. Namely, the combination of operations does not influence the throughput and the throughput does not depend on the number of elements in the dataset (as long as the cardinality of the dataset is bounded by <math><msub><mrow><mi>n</mi></mrow><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub></math>). Previous dynamic filter implementations in software (implemented on x86 or GPU’s) do not exhibit stable and constant throughputs.</div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"105 ","pages":"Article 104992"},"PeriodicalIF":2.6000,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microprocessors and Microsystems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141933123002375","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

We present the first hardware design that supports operations over the Fano–Elias encoding (FE-encoding). Our design is a combinational circuit (i.e., single clock cycle) that supports insertions, deletions, and queries. FE-encoding allows one to store $f$ binary strings, each of length $ℓ + log m$ using a string that is $m + f + f ℓ$ bits long (rather than $f (ℓ + log m)$ ). The asymptotic gate-count of the circuit is $Θ ((m + f) \cdot lg m + f \cdot ℓ)$ . The asymptotic delay is $Θ (lg m + lg f + lg ℓ)$ . We implemented our design on an FPGA with four combinations of parameters in which the FE-encoding fits in 512 or 1024 bits.

We present the first hardware design for a dynamic filter that maintains a set subject to insertions, deletions, and approximate membership queries. The design contains four main blocks: two memory banks that store FE-encodings and two combinational circuits for FE-encoding. Additional logic deals with double buffering and forwarding.

We implemented the dynamic filter on an FPGA with the following parameters: (1) Elements in the dataset are 32-bit strings. (2) The supported dataset can contain up to $n_{m a x} = 45 \cdot 2^{14} = 737, 280$ elements. (3) The latency is 2-4 clock cycles. (4) Fixed (i.e., constant and stable) throughput. A new operation can be issued every clock cycle. (5) We prove that the probability of a false-positive error is bounded by $0.385 \cdot 1 0^{- 2}$ . (6) We prove that the expected number of insertion failures is less than 1 for every 75 million insertions.

Synthesis of our filter on a Xilinx Alveo U250 FPGA achieves a clock rate of 100 MHz (the critical path is due to the memory access). We measure a fixed throughput of 97.7 million operations per second (the loss of 2.3% in the throughput is due to instabilities in the bandwidth of the AXI4 Lite I/O channel).

A unique feature of our filter implementation is that the throughput is stable and constant for all benchmarks and loads. Namely, the combination of operations does not influence the throughput and the throughput does not depend on the number of elements in the dataset (as long as the cardinality of the dataset is bounded by $n_{m a x}$ ). Previous dynamic filter implementations in software (implemented on x86 or GPU’s) do not exhibit stable and constant throughputs.

查看原文本刊更多论文

支持 Fano-Elias 编码的微体系结构和用于近似成员查询的硬件加速器

我们首次提出了支持对 Fano-Elias 编码（FE-encoding）进行操作的硬件设计。我们的设计是一个组合电路（即单时钟周期），支持插入、删除和查询。FE-encoding 允许使用 m+f+fℓ 位长（而不是 f(ℓ+logm)）的字符串来存储 f 个二进制字符串，每个字符串的长度为 ℓ+logm 。电路的渐近门数为Θ((m+f)⋅lgm+f⋅ℓ)。渐近延迟为 Θ(lgm+lgf+lgℓ)。我们在 FPGA 上用四种参数组合实现了我们的设计，其中 FE 编码适合 512 位或 1024 位。我们首次提出了动态滤波器的硬件设计，该滤波器可在插入、删除和近似成员查询的情况下维护一个集合。该设计包含四个主要模块：两个存储 FE 编码的内存库和两个用于 FE 编码的组合电路。我们在 FPGA 上实现了动态过滤器，参数如下：(1) 数据集中的元素是 32 位字符串。(2) 支持的数据集最多可包含 nmax=45⋅214=737 280 个元素。(3) 延迟为 2-4 个时钟周期。(4) 固定（即恒定稳定）的吞吐量。每个时钟周期可发出一个新操作。(5) 我们证明，假阳性错误的概率边界为 0.385⋅10-2 (6) 我们证明，每 7,500 万次插入中，插入失败的预期次数小于 1 次。在 Xilinx Alveo U250 FPGA 上合成我们的滤波器，可实现 100 MHz 的时钟速率（关键路径是内存访问）。我们测得的固定吞吐量为每秒 9770 万次操作（由于 AXI4 Lite I/O 通道带宽的不稳定性，吞吐量损失了 2.3%）。也就是说，操作的组合不会影响吞吐量，吞吐量也不取决于数据集的元素数量（只要数据集的卡入度以 nmax 为界）。以前在软件中实现的动态滤波器（在 x86 或 GPU 上实现）并没有表现出稳定恒定的吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Microprocessors and Microsystems 工程技术-工程：电子与电气

CiteScore

6.90

自引率

3.80%

发文量

204

审稿时长

172 days

期刊介绍： Microprocessors and Microsystems: Embedded Hardware Design (MICPRO) is a journal covering all design and architectural aspects related to embedded systems hardware. This includes different embedded system hardware platforms ranging from custom hardware via reconfigurable systems and application specific processors to general purpose embedded processors. Special emphasis is put on novel complex embedded architectures, such as systems on chip (SoC), systems on a programmable/reconfigurable chip (SoPC) and multi-processor systems on a chip (MPSoC), as well as, their memory and communication methods and structures, such as network-on-chip (NoC). Design automation of such systems including methodologies, techniques, flows and tools for their design, as well as, novel designs of hardware components fall within the scope of this journal. Novel cyber-physical applications that use embedded systems are also central in this journal. While software is not in the main focus of this journal, methods of hardware/software co-design, as well as, application restructuring and mapping to embedded hardware platforms, that consider interplay between software and hardware components with emphasis on hardware, are also in the journal scope.