Fast approximation of the top-k items in data streams using FPGAs

IF 0.8 4区计算机科学 Q4 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IET Computers and Digital Techniques Pub Date : 2023-02-19 DOI:10.1049/cdt2.12053

Ali Ebrahim, Jalal Khalifat

{"title":"Fast approximation of the top-k items in data streams using FPGAs","authors":"Ali Ebrahim, Jalal Khalifat","doi":"10.1049/cdt2.12053","DOIUrl":null,"url":null,"abstract":"Two methods are presented for finding the top-k items in data streams using Field Programmable Gate Arrays (FPGAs). These methods deploy two variants of a novel accelerator architecture capable of extracting an approximate list of the topmost frequently occurring items in a single pass over the input stream without the need for random access. The first variant of the accelerator implements the well-known Probabilistic sampling algorithm by mapping its main processing stages to a hardware architecture consisting of two custom systolic arrays. The proposed architecture retains all the properties of this algorithm, which works even if the stream size is unknown at run time. The architecture shows better scalability compared to other architectures that are based on other stream algorithms. In addition, experimental results on both synthetic and real datasets, when implementing the accelerator on an Intel Arria 10 GX 1150 FPGA device, showed very good accuracy and significant throughput gains compared to the existing software and hardware-accelerated solutions. The second variant of the accelerator is specifically tailored for applications requiring higher accuracy, provided that the size of the stream is known at run time. This variant takes advantage of the embedded memory resources in an FPGA to implement a sketch-based filter that precedes the main systolic array in the accelerator's pipeline. This filter enhances the accuracy of the accelerator by pre-processing the stream to remove much of the insignificant items, allowing the accelerator to process a significantly smaller filtered stream.","PeriodicalId":50383,"journal":{"name":"IET Computers and Digital Techniques","volume":"17 2","pages":"60-73"},"PeriodicalIF":0.8000,"publicationDate":"2023-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cdt2.12053","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IET Computers and Digital Techniques","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cdt2.12053","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 1

Abstract

Two methods are presented for finding the top-k items in data streams using Field Programmable Gate Arrays (FPGAs). These methods deploy two variants of a novel accelerator architecture capable of extracting an approximate list of the topmost frequently occurring items in a single pass over the input stream without the need for random access. The first variant of the accelerator implements the well-known Probabilistic sampling algorithm by mapping its main processing stages to a hardware architecture consisting of two custom systolic arrays. The proposed architecture retains all the properties of this algorithm, which works even if the stream size is unknown at run time. The architecture shows better scalability compared to other architectures that are based on other stream algorithms. In addition, experimental results on both synthetic and real datasets, when implementing the accelerator on an Intel Arria 10 GX 1150 FPGA device, showed very good accuracy and significant throughput gains compared to the existing software and hardware-accelerated solutions. The second variant of the accelerator is specifically tailored for applications requiring higher accuracy, provided that the size of the stream is known at run time. This variant takes advantage of the embedded memory resources in an FPGA to implement a sketch-based filter that precedes the main systolic array in the accelerator's pipeline. This filter enhances the accuracy of the accelerator by pre-processing the stream to remove much of the insignificant items, allowing the accelerator to process a significantly smaller filtered stream.

Abstract Image

查看原文本刊更多论文

使用FPGA快速逼近数据流中的前k项

提出了两种使用现场可编程门阵列（FPGA）查找数据流中前k项的方法。这些方法部署了一种新型加速器架构的两种变体，该架构能够在不需要随机访问的情况下在输入流上的一次传递中提取最频繁出现的项目的近似列表。加速器的第一个变体通过将其主要处理阶段映射到由两个自定义收缩阵列组成的硬件架构来实现众所周知的概率采样算法。所提出的体系结构保留了该算法的所有属性，即使在运行时流大小未知，该算法也能工作。与基于其他流算法的其他架构相比，该架构显示出更好的可扩展性。此外，当在Intel Arria 10 GX 1150 FPGA设备上实现加速器时，在合成和真实数据集上的实验结果显示，与现有的软件和硬件加速解决方案相比，具有非常好的准确性和显著的吞吐量提高。加速器的第二种变体是专门为需要更高精度的应用而定制的，前提是在运行时已知流的大小。该变体利用FPGA中的嵌入式内存资源来实现基于草图的滤波器，该滤波器位于加速器管道中的主收缩阵列之前。该过滤器通过预处理流以去除大部分不重要的项目来提高加速器的准确性，从而允许加速器处理明显较小的过滤流。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IET Computers and Digital Techniques 工程技术-计算机：理论方法

CiteScore

3.50

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： IET Computers & Digital Techniques publishes technical papers describing recent research and development work in all aspects of digital system-on-chip design and test of electronic and embedded systems, including the development of design automation tools (methodologies, algorithms and architectures). Papers based on the problems associated with the scaling down of CMOS technology are particularly welcome. It is aimed at researchers, engineers and educators in the fields of computer and digital systems design and test. The key subject areas of interest are: Design Methods and Tools: CAD/EDA tools, hardware description languages, high-level and architectural synthesis, hardware/software co-design, platform-based design, 3D stacking and circuit design, system on-chip architectures and IP cores, embedded systems, logic synthesis, low-power design and power optimisation. Simulation, Test and Validation: electrical and timing simulation, simulation based verification, hardware/software co-simulation and validation, mixed-domain technology modelling and simulation, post-silicon validation, power analysis and estimation, interconnect modelling and signal integrity analysis, hardware trust and security, design-for-testability, embedded core testing, system-on-chip testing, on-line testing, automatic test generation and delay testing, low-power testing, reliability, fault modelling and fault tolerance. Processor and System Architectures: many-core systems, general-purpose and application specific processors, computational arithmetic for DSP applications, arithmetic and logic units, cache memories, memory management, co-processors and accelerators, systems and networks on chip, embedded cores, platforms, multiprocessors, distributed systems, communication protocols and low-power issues. Configurable Computing: embedded cores, FPGAs, rapid prototyping, adaptive computing, evolvable and statically and dynamically reconfigurable and reprogrammable systems, reconfigurable hardware. Design for variability, power and aging: design methods for variability, power and aging aware design, memories, FPGAs, IP components, 3D stacking, energy harvesting. Case Studies: emerging applications, applications in industrial designs, and design frameworks.