{"title":"A Micro-architecture that supports the Fano–Elias encoding and a hardware accelerator for approximate membership queries","authors":"Guy Even, Gabriel Marques Domingues","doi":"10.1016/j.micpro.2023.104992","DOIUrl":null,"url":null,"abstract":"<div><p><span>We present the first hardware design that supports operations over the Fano–Elias encoding (FE-encoding). Our design is a combinational circuit (i.e., single clock cycle) that supports insertions, deletions, and queries. FE-encoding allows one to store </span><span><math><mi>f</mi></math></span> binary strings, each of length <span><math><mrow><mi>ℓ</mi><mo>+</mo><mo>log</mo><mi>m</mi></mrow></math></span> using a string that is <span><math><mrow><mi>m</mi><mo>+</mo><mi>f</mi><mo>+</mo><mi>f</mi><mi>ℓ</mi></mrow></math></span> bits long (rather than <span><math><mrow><mi>f</mi><mrow><mo>(</mo><mi>ℓ</mi><mo>+</mo><mo>log</mo><mi>m</mi><mo>)</mo></mrow></mrow></math></span>). The asymptotic gate-count of the circuit is <span><math><mrow><mi>Θ</mi><mrow><mo>(</mo><mrow><mo>(</mo><mi>m</mi><mo>+</mo><mi>f</mi><mo>)</mo></mrow><mi>⋅</mi><mo>lg</mo><mi>m</mi><mo>+</mo><mi>f</mi><mi>⋅</mi><mi>ℓ</mi><mo>)</mo></mrow></mrow></math></span>. The asymptotic delay is <span><math><mrow><mi>Θ</mi><mrow><mo>(</mo><mo>lg</mo><mi>m</mi><mo>+</mo><mo>lg</mo><mi>f</mi><mo>+</mo><mo>lg</mo><mi>ℓ</mi><mo>)</mo></mrow></mrow></math></span><span>. We implemented our design on an FPGA with four combinations of parameters in which the FE-encoding fits in 512 or 1024 bits.</span></p><p>We present the first hardware design for a dynamic filter that maintains a set subject to insertions, deletions, and approximate membership queries. The design contains four main blocks: two memory banks that store FE-encodings and two combinational circuits for FE-encoding. Additional logic deals with double buffering and forwarding.</p><p>We implemented the dynamic filter on an FPGA with the following parameters: (1) Elements in the dataset are 32-bit strings. (2) The supported dataset can contain up to <span><math><mrow><msub><mrow><mi>n</mi></mrow><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub><mo>=</mo><mn>45</mn><mi>⋅</mi><msup><mrow><mn>2</mn></mrow><mrow><mn>14</mn></mrow></msup><mo>=</mo><mn>737</mn><mo>,</mo><mn>280</mn></mrow></math></span> elements. (3) The latency is 2-4 clock cycles. (4) Fixed (i.e., constant and stable) throughput. A new operation can be issued <em>every</em> clock cycle. (5) We prove that the probability of a false-positive error is bounded by <span><math><mrow><mn>0</mn><mo>.</mo><mn>385</mn><mi>⋅</mi><mn>1</mn><msup><mrow><mn>0</mn></mrow><mrow><mo>−</mo><mn>2</mn></mrow></msup></mrow></math></span>. (6) We prove that the expected number of insertion failures is less than 1 for every 75 million insertions.</p><p>Synthesis of our filter on a Xilinx Alveo U250 FPGA achieves a clock rate of 100 MHz (the critical path is due to the memory access). We measure a fixed throughput of 97.7 million operations per second (the loss of 2.3% in the throughput is due to instabilities in the bandwidth of the AXI4 Lite I/O channel).</p><p>A unique feature of our filter implementation is that the throughput is stable and constant for all benchmarks and loads. Namely, the combination of operations does not influence the throughput and the throughput does not depend on the number of elements in the dataset (as long as the cardinality of the dataset is bounded by <span><math><msub><mrow><mi>n</mi></mrow><mrow><mi>m</mi><mi>a</mi><mi>x</mi></mrow></msub></math></span>). Previous dynamic filter implementations in software (implemented on x86 or GPU’s) do not exhibit stable and constant throughputs.</p></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"105 ","pages":"Article 104992"},"PeriodicalIF":1.9000,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microprocessors and Microsystems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141933123002375","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
We present the first hardware design that supports operations over the Fano–Elias encoding (FE-encoding). Our design is a combinational circuit (i.e., single clock cycle) that supports insertions, deletions, and queries. FE-encoding allows one to store binary strings, each of length using a string that is bits long (rather than ). The asymptotic gate-count of the circuit is . The asymptotic delay is . We implemented our design on an FPGA with four combinations of parameters in which the FE-encoding fits in 512 or 1024 bits.
We present the first hardware design for a dynamic filter that maintains a set subject to insertions, deletions, and approximate membership queries. The design contains four main blocks: two memory banks that store FE-encodings and two combinational circuits for FE-encoding. Additional logic deals with double buffering and forwarding.
We implemented the dynamic filter on an FPGA with the following parameters: (1) Elements in the dataset are 32-bit strings. (2) The supported dataset can contain up to elements. (3) The latency is 2-4 clock cycles. (4) Fixed (i.e., constant and stable) throughput. A new operation can be issued every clock cycle. (5) We prove that the probability of a false-positive error is bounded by . (6) We prove that the expected number of insertion failures is less than 1 for every 75 million insertions.
Synthesis of our filter on a Xilinx Alveo U250 FPGA achieves a clock rate of 100 MHz (the critical path is due to the memory access). We measure a fixed throughput of 97.7 million operations per second (the loss of 2.3% in the throughput is due to instabilities in the bandwidth of the AXI4 Lite I/O channel).
A unique feature of our filter implementation is that the throughput is stable and constant for all benchmarks and loads. Namely, the combination of operations does not influence the throughput and the throughput does not depend on the number of elements in the dataset (as long as the cardinality of the dataset is bounded by ). Previous dynamic filter implementations in software (implemented on x86 or GPU’s) do not exhibit stable and constant throughputs.
期刊介绍:
Microprocessors and Microsystems: Embedded Hardware Design (MICPRO) is a journal covering all design and architectural aspects related to embedded systems hardware. This includes different embedded system hardware platforms ranging from custom hardware via reconfigurable systems and application specific processors to general purpose embedded processors. Special emphasis is put on novel complex embedded architectures, such as systems on chip (SoC), systems on a programmable/reconfigurable chip (SoPC) and multi-processor systems on a chip (MPSoC), as well as, their memory and communication methods and structures, such as network-on-chip (NoC).
Design automation of such systems including methodologies, techniques, flows and tools for their design, as well as, novel designs of hardware components fall within the scope of this journal. Novel cyber-physical applications that use embedded systems are also central in this journal. While software is not in the main focus of this journal, methods of hardware/software co-design, as well as, application restructuring and mapping to embedded hardware platforms, that consider interplay between software and hardware components with emphasis on hardware, are also in the journal scope.