FPGA Implementation of processing element unit in CNN accelerator using Modified Booth Multiplier and Wallace Tree Adder on UniWiG Architecture

2022 IEEE International Power and Renewable Energy Conference (IPRECON) Pub Date : 2022-12-16 DOI:10.1109/IPRECON55716.2022.10059525

Bless Thomas, Manju Manuel

{"title":"FPGA Implementation of processing element unit in CNN accelerator using Modified Booth Multiplier and Wallace Tree Adder on UniWiG Architecture","authors":"Bless Thomas, Manju Manuel","doi":"10.1109/IPRECON55716.2022.10059525","DOIUrl":null,"url":null,"abstract":"Deep Neural Networks (DNNs) are useful for re-solving many practical problems such as traffic monitoring, vehicle detections. Among DNNs, Convolutional Neural Networks (CNNs) are generally used for image processing and video processing applications. In CNN, most of the computations are used up by convolution process. Winograd minimal filtering-based algorithm is one of the effective methods for computing convolution for small filter sizes. A prominant component of CNN accelerator design is the processing element (PE) unit which mainly comprises of the bulky multiply and accumulate (MAC) units and adder tree. It is the PE that performs the convolution operation. In this paper, new processing element has been designed using Modified Booth Encoding multiplier (MBE) and Wallace tree adders to reduce the amount of hardware resources and power consumption. This modified PE unit is implemented on an architecture known as UniWiG (Unified Winograd GEMM architecture). The proposed design reduces hardware complexity and achieves better power efficiency than the previous designs. Hardware realization of this work is done using Verilog Hardware Description Language(HDL) and tested on FPGA board.","PeriodicalId":407222,"journal":{"name":"2022 IEEE International Power and Renewable Energy Conference (IPRECON)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Power and Renewable Energy Conference (IPRECON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPRECON55716.2022.10059525","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deep Neural Networks (DNNs) are useful for re-solving many practical problems such as traffic monitoring, vehicle detections. Among DNNs, Convolutional Neural Networks (CNNs) are generally used for image processing and video processing applications. In CNN, most of the computations are used up by convolution process. Winograd minimal filtering-based algorithm is one of the effective methods for computing convolution for small filter sizes. A prominant component of CNN accelerator design is the processing element (PE) unit which mainly comprises of the bulky multiply and accumulate (MAC) units and adder tree. It is the PE that performs the convolution operation. In this paper, new processing element has been designed using Modified Booth Encoding multiplier (MBE) and Wallace tree adders to reduce the amount of hardware resources and power consumption. This modified PE unit is implemented on an architecture known as UniWiG (Unified Winograd GEMM architecture). The proposed design reduces hardware complexity and achieves better power efficiency than the previous designs. Hardware realization of this work is done using Verilog Hardware Description Language(HDL) and tested on FPGA board.

查看原文本刊更多论文

基于UniWiG架构的改进Booth乘法器和Wallace树加法器的CNN加速器处理单元FPGA实现

深度神经网络(dnn)对于解决交通监控、车辆检测等许多实际问题非常有用。在深度神经网络中，卷积神经网络(cnn)通常用于图像处理和视频处理。在CNN中，大部分的计算量都被卷积过程消耗掉了。基于Winograd最小滤波算法是计算小滤波器卷积的有效方法之一。CNN加速器设计的一个重要组成部分是处理单元(PE)单元，它主要由庞大的乘法累加单元(MAC)和加法树组成。执行卷积操作的是PE。本文采用改进的Booth编码乘法器(Modified Booth Encoding multiplier, MBE)和Wallace树加法器设计了新的处理单元，以减少硬件资源和功耗。这个修改后的PE单元在称为UniWiG(统一Winograd gem架构)的体系结构上实现。与以往的设计相比，该设计降低了硬件复杂度，并实现了更高的功耗效率。使用Verilog硬件描述语言(HDL)完成了该工作的硬件实现，并在FPGA板上进行了测试。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Power and Renewable Energy Conference (IPRECON)

自引率

0.00%

发文量