A review on hardware accelerators for convolutional neural network-based inference engines: Strategies for performance and energy-efficiency enhancement

IF 1.9 4区 计算机科学 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Deepika S․ , Arunachalam V․ , Alex Noel Joseph Raj
{"title":"A review on hardware accelerators for convolutional neural network-based inference engines: Strategies for performance and energy-efficiency enhancement","authors":"Deepika S․ ,&nbsp;Arunachalam V․ ,&nbsp;Alex Noel Joseph Raj","doi":"10.1016/j.micpro.2025.105146","DOIUrl":null,"url":null,"abstract":"<div><div>In time-critical &amp; safety-critical image classification applications, Convolutional Neural Networks (CNNs) based Inference Engines (IEs) are preferred and required to be fast, accurate, and cost-effective to meet the market demands. The self-feature extraction capabilities use millions of parameters and neurons in the stack of layers with restricted processing time. This paper reviews strategies applied in Hardware-based image classification CNN inference engines. The acceleration strategies are (1) Arithmetic Logic Unit (ALU)-based, (2) Data flow-based, and (3) Sparsity-based are considered here. Considering benchmark accuracy, the 16-bit mixed fixed/floating point could provide 99 % and 3.75 times more performance than Half-precision floating point in an application-specific CNN model. Feeding 2-dimensional or 3-dimensional data frames to the CNN layers would reuse the data. It optimizes the volume of memory usage and improves the efficiency of the processor array. The pruning of zero/near-zero valued Input Feature Maps (IFMs) and weights leads to sparsity in the data fed to the different layers. Therefore, data compression strategies and skipping the trivial computation (zero skipping approach) would reduce the complexity of the controller. There is a benchmark performance improvement of 1.17 times and 6.2 times in power efficiency compared to dense architecture. Minimizing the complexity of indexing and load balancing controller would improve the performance further.</div></div>","PeriodicalId":49815,"journal":{"name":"Microprocessors and Microsystems","volume":"113 ","pages":"Article 105146"},"PeriodicalIF":1.9000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Microprocessors and Microsystems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0141933125000146","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

In time-critical & safety-critical image classification applications, Convolutional Neural Networks (CNNs) based Inference Engines (IEs) are preferred and required to be fast, accurate, and cost-effective to meet the market demands. The self-feature extraction capabilities use millions of parameters and neurons in the stack of layers with restricted processing time. This paper reviews strategies applied in Hardware-based image classification CNN inference engines. The acceleration strategies are (1) Arithmetic Logic Unit (ALU)-based, (2) Data flow-based, and (3) Sparsity-based are considered here. Considering benchmark accuracy, the 16-bit mixed fixed/floating point could provide 99 % and 3.75 times more performance than Half-precision floating point in an application-specific CNN model. Feeding 2-dimensional or 3-dimensional data frames to the CNN layers would reuse the data. It optimizes the volume of memory usage and improves the efficiency of the processor array. The pruning of zero/near-zero valued Input Feature Maps (IFMs) and weights leads to sparsity in the data fed to the different layers. Therefore, data compression strategies and skipping the trivial computation (zero skipping approach) would reduce the complexity of the controller. There is a benchmark performance improvement of 1.17 times and 6.2 times in power efficiency compared to dense architecture. Minimizing the complexity of indexing and load balancing controller would improve the performance further.
求助全文
约1分钟内获得全文 求助全文
来源期刊
Microprocessors and Microsystems
Microprocessors and Microsystems 工程技术-工程:电子与电气
CiteScore
6.90
自引率
3.80%
发文量
204
审稿时长
172 days
期刊介绍: Microprocessors and Microsystems: Embedded Hardware Design (MICPRO) is a journal covering all design and architectural aspects related to embedded systems hardware. This includes different embedded system hardware platforms ranging from custom hardware via reconfigurable systems and application specific processors to general purpose embedded processors. Special emphasis is put on novel complex embedded architectures, such as systems on chip (SoC), systems on a programmable/reconfigurable chip (SoPC) and multi-processor systems on a chip (MPSoC), as well as, their memory and communication methods and structures, such as network-on-chip (NoC). Design automation of such systems including methodologies, techniques, flows and tools for their design, as well as, novel designs of hardware components fall within the scope of this journal. Novel cyber-physical applications that use embedded systems are also central in this journal. While software is not in the main focus of this journal, methods of hardware/software co-design, as well as, application restructuring and mapping to embedded hardware platforms, that consider interplay between software and hardware components with emphasis on hardware, are also in the journal scope.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信