FPGA-based Deep Learning Inference Accelerators: Where Are We Standing?

IF 2.8 4区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

ACM Transactions on Reconfigurable Technology and Systems Pub Date : 2023-09-04 DOI:10.1145/3613963

Anouar Nechi, Lukas Groth, Saleh Mulhem, Farhad Merchant, R. Buchty, Mladen Berekovic

{"title":"FPGA-based Deep Learning Inference Accelerators: Where Are We Standing?","authors":"Anouar Nechi, Lukas Groth, Saleh Mulhem, Farhad Merchant, R. Buchty, Mladen Berekovic","doi":"10.1145/3613963","DOIUrl":null,"url":null,"abstract":"Recently, artificial intelligence applications have become part of almost all emerging technologies around us. Neural networks, in particular, have shown significant advantages and have been widely adopted over other approaches in machine learning. In this context, high processing power is deemed a fundamental challenge and a persistent requirement. Recent solutions facing such a challenge deploy hardware platforms to provide high computing performance for neural networks and deep learning algorithms. This direction is also rapidly taking over the market. Here, FPGAs occupy the middle ground regarding flexibility, reconfigurability, and efficiency compared to general-purpose CPUs, GPUs, on one side, and manufactured ASICs on the other. FPGA-based accelerators exploit the features of FPGAs to increase the computing performance for specific algorithms and algorithm features. Filling a gap, we provide holistic benchmarking criteria and optimization techniques that work across several classes of deep learning implementations. This paper summarizes the current state of deep learning hardware acceleration: More than 120 FPGA-based neural network accelerator designs are presented and evaluated based on a matrix of performance and acceleration criteria, and corresponding optimization techniques are presented and discussed. In addition, the evaluation criteria and optimization techniques are demonstrated by benchmarking ResNet-2 and LSTM-based accelerators.","PeriodicalId":49248,"journal":{"name":"ACM Transactions on Reconfigurable Technology and Systems","volume":" ","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2023-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Reconfigurable Technology and Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3613963","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, artificial intelligence applications have become part of almost all emerging technologies around us. Neural networks, in particular, have shown significant advantages and have been widely adopted over other approaches in machine learning. In this context, high processing power is deemed a fundamental challenge and a persistent requirement. Recent solutions facing such a challenge deploy hardware platforms to provide high computing performance for neural networks and deep learning algorithms. This direction is also rapidly taking over the market. Here, FPGAs occupy the middle ground regarding flexibility, reconfigurability, and efficiency compared to general-purpose CPUs, GPUs, on one side, and manufactured ASICs on the other. FPGA-based accelerators exploit the features of FPGAs to increase the computing performance for specific algorithms and algorithm features. Filling a gap, we provide holistic benchmarking criteria and optimization techniques that work across several classes of deep learning implementations. This paper summarizes the current state of deep learning hardware acceleration: More than 120 FPGA-based neural network accelerator designs are presented and evaluated based on a matrix of performance and acceleration criteria, and corresponding optimization techniques are presented and discussed. In addition, the evaluation criteria and optimization techniques are demonstrated by benchmarking ResNet-2 and LSTM-based accelerators.

查看原文本刊更多论文

基于FPGA的深度学习推理加速器：我们站在哪里？

最近，人工智能应用已经成为我们身边几乎所有新兴技术的一部分。特别是神经网络，已经显示出显著的优势，并且在机器学习中被广泛采用。在这种情况下，高处理能力被认为是一个基本的挑战和持久的需求。面对这样的挑战，最近的解决方案部署硬件平台，为神经网络和深度学习算法提供高计算性能。这个方向也正在迅速占领市场。在这里，与通用cpu、gpu和制造的asic相比，fpga在灵活性、可重构性和效率方面处于中间位置。基于fpga的加速器利用fpga的特性来提高特定算法和算法特征的计算性能。为了填补这一空白，我们提供了跨几类深度学习实现的整体基准测试标准和优化技术。本文总结了深度学习硬件加速的现状:提出了120多个基于fpga的神经网络加速器设计，并基于性能和加速标准矩阵进行了评估，并提出和讨论了相应的优化技术。此外，通过对基于ResNet-2和lstm的加速器进行基准测试，验证了评估标准和优化技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Reconfigurable Technology and Systems COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-

CiteScore

4.90

自引率

8.70%

发文量

审稿时长

>12 weeks

期刊介绍： TRETS is the top journal focusing on research in, on, and with reconfigurable systems and on their underlying technology. The scope, rationale, and coverage by other journals are often limited to particular aspects of reconfigurable technology or reconfigurable systems. TRETS is a journal that covers reconfigurability in its own right. Topics that would be appropriate for TRETS would include all levels of reconfigurable system abstractions and all aspects of reconfigurable technology including platforms, programming environments and application successes that support these systems for computing or other applications. -The board and systems architectures of a reconfigurable platform. -Programming environments of reconfigurable systems, especially those designed for use with reconfigurable systems that will lead to increased programmer productivity. -Languages and compilers for reconfigurable systems. -Logic synthesis and related tools, as they relate to reconfigurable systems. -Applications on which success can be demonstrated. The underlying technology from which reconfigurable systems are developed. (Currently this technology is that of FPGAs, but research on the nature and use of follow-on technologies is appropriate for TRETS.) In considering whether a paper is suitable for TRETS, the foremost question should be whether reconfigurability has been essential to success. Topics such as architecture, programming languages, compilers, and environments, logic synthesis, and high performance applications are all suitable if the context is appropriate. For example, an architecture for an embedded application that happens to use FPGAs is not necessarily suitable for TRETS, but an architecture using FPGAs for which the reconfigurability of the FPGAs is an inherent part of the specifications (perhaps due to a need for re-use on multiple applications) would be appropriate for TRETS.