A Multiplier-Free RNS-Based CNN Accelerator Exploiting Bit-Level Sparsity

IF 5.1 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Emerging Topics in Computing Pub Date : 2023-08-10 DOI:10.1109/TETC.2023.3301590

Vasilis Sakellariou;Vassilis Paliouras;Ioannis Kouretas;Hani Saleh;Thanos Stouraitis

{"title":"A Multiplier-Free RNS-Based CNN Accelerator Exploiting Bit-Level Sparsity","authors":"Vasilis Sakellariou;Vassilis Paliouras;Ioannis Kouretas;Hani Saleh;Thanos Stouraitis","doi":"10.1109/TETC.2023.3301590","DOIUrl":null,"url":null,"abstract":"In this work, a Residue Numbering System (RNS)-based Convolutional Neural Network (CNN) accelerator utilizing a multiplier-free distributed-arithmetic Processing Element (PE) is proposed. A method for maximizing the utilization of the arithmetic hardware resources is presented. It leads to an increase of the system's throughput, by exploiting bit-level sparsity within the weight vectors. The proposed PE design takes advantage of the properties of RNS and Canonical Signed Digit (CSD) encoding to achieve higher energy efficiency and effective processing rate, without requiring any compression mechanism or introducing any approximation. An extensive design space exploration for various parameters (RNS base, PE micro-architecture, encoding) using analytical models as well as experimental results from CNN benchmarks is conducted and the various trade-offs are analyzed. A complete end-to-end RNS accelerator is developed based on the proposed PE. The introduced accelerator is compared to traditional binary and RNS counterparts as well as to other state-of-the-art systems. Implementation results in a 22-nm process show that the proposed PE can lead to \n<inline-formula><tex-math>$1.85\\times$</tex-math></inline-formula>\n and \n<inline-formula><tex-math>$1.54\\times$</tex-math></inline-formula>\n more energy-efficient processing compared to binary and conventional RNS, respectively, with a \n<inline-formula><tex-math>$1.88\\times$</tex-math></inline-formula>\n maximum increase of effective throughput for the employed benchmarks. Compared to a state-of-the-art, all-digital, RNS-based system, the proposed accelerator is \n<inline-formula><tex-math>$8.87\\times$</tex-math></inline-formula>\n and \n<inline-formula><tex-math>$1.11\\times$</tex-math></inline-formula>\n more energy- and area-efficient, respectively.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"12 2","pages":"667-683"},"PeriodicalIF":5.1000,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10214485/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In this work, a Residue Numbering System (RNS)-based Convolutional Neural Network (CNN) accelerator utilizing a multiplier-free distributed-arithmetic Processing Element (PE) is proposed. A method for maximizing the utilization of the arithmetic hardware resources is presented. It leads to an increase of the system's throughput, by exploiting bit-level sparsity within the weight vectors. The proposed PE design takes advantage of the properties of RNS and Canonical Signed Digit (CSD) encoding to achieve higher energy efficiency and effective processing rate, without requiring any compression mechanism or introducing any approximation. An extensive design space exploration for various parameters (RNS base, PE micro-architecture, encoding) using analytical models as well as experimental results from CNN benchmarks is conducted and the various trade-offs are analyzed. A complete end-to-end RNS accelerator is developed based on the proposed PE. The introduced accelerator is compared to traditional binary and RNS counterparts as well as to other state-of-the-art systems. Implementation results in a 22-nm process show that the proposed PE can lead to

$1.85\times$

and

$1.54\times$

more energy-efficient processing compared to binary and conventional RNS, respectively, with a

$1.88\times$

maximum increase of effective throughput for the employed benchmarks. Compared to a state-of-the-art, all-digital, RNS-based system, the proposed accelerator is

$8.87\times$

and

$1.11\times$

more energy- and area-efficient, respectively.

查看原文本刊更多论文

利用位级稀疏性的无乘法器 RNS 型 CNN 加速器

本研究提出了一种基于残差编码系统（RNS）的卷积神经网络（CNN）加速器，该加速器采用了无乘法器分布式算术处理元件（PE）。它提出了一种最大化算术硬件资源利用率的方法。通过利用权重向量中的位级稀疏性，该方法提高了系统的吞吐量。拟议的 PE 设计利用了 RNS 和 Canonical Signed Digit (CSD) 编码的特性，实现了更高的能效和有效处理率，而不需要任何压缩机制或引入任何近似值。利用分析模型和 CNN 基准的实验结果，对各种参数（RNS 基础、PE 微体系结构、编码）进行了广泛的设计空间探索，并对各种权衡进行了分析。基于所提出的 PE，开发了一个完整的端到端 RNS 加速器。将引入的加速器与传统的二进制和 RNS 对应系统以及其他最先进的系统进行了比较。在 22 纳米工艺中的实现结果表明，与二进制和传统 RNS 相比，所提出的 PE 可使能效处理分别提高 1.85 倍和 1.54 倍，所采用基准的有效吞吐量最大提高 1.88 倍。与最先进的全数字 RNS 系统相比，所提出的加速器的能效和面积效率分别提高了 8.87 倍和 1.11 倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Emerging Topics in Computing Computer Science-Computer Science (miscellaneous)

CiteScore

12.10

自引率

5.10%

发文量

113

期刊介绍： IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.