卷积神经网络二维卷积的节能硬件实现

2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON) Pub Date : 2022-12-02 DOI:10.1109/UPCON56432.2022.9986483

S. K. Sharma, Anu Gupta, K. Raju

{"title":"卷积神经网络二维卷积的节能硬件实现","authors":"S. K. Sharma, Anu Gupta, K. Raju","doi":"10.1109/UPCON56432.2022.9986483","DOIUrl":null,"url":null,"abstract":"Over the last year, Deep neural networks (DNN) have been significantly accepted for computer vision applications because of high classification accuracy and versatility. Convolutional Neural Network (CNN) is one of the most popular architectures of DNN which is widely adopted for image, speech and video recognition. Extensive computation and large memory requirement of CNN s poses the bottleneck on its application. Field Programmable Gate Arrays (FPGAs) are considered to be suitable hardware platforms for deployment of CNNs with low power requirements. This paper focus on the design and implementation of hardware accelerator to perform the convolution product (matrix-matrix multiplication. We have used two optimization techniques to achieve energy efficiency. First, dataflow of the convolution phase is rescheduled to reduce the undesired on-chip memory accesses. Further, efficiency is enhanced by reducing the internal parallelism of structure as much as possible. Our architecture is implemented on the Xilinx ZCU104 evaluation board. The implemented design attains 98.1 GOPS/Joule and 32.77 GOPS/Joule for 8-bit and 16-bit data width respectively.","PeriodicalId":185782,"journal":{"name":"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Energy Efficient Hardware Implementation of 2-D Convolution for Convolutional Neural Network\",\"authors\":\"S. K. Sharma, Anu Gupta, K. Raju\",\"doi\":\"10.1109/UPCON56432.2022.9986483\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Over the last year, Deep neural networks (DNN) have been significantly accepted for computer vision applications because of high classification accuracy and versatility. Convolutional Neural Network (CNN) is one of the most popular architectures of DNN which is widely adopted for image, speech and video recognition. Extensive computation and large memory requirement of CNN s poses the bottleneck on its application. Field Programmable Gate Arrays (FPGAs) are considered to be suitable hardware platforms for deployment of CNNs with low power requirements. This paper focus on the design and implementation of hardware accelerator to perform the convolution product (matrix-matrix multiplication. We have used two optimization techniques to achieve energy efficiency. First, dataflow of the convolution phase is rescheduled to reduce the undesired on-chip memory accesses. Further, efficiency is enhanced by reducing the internal parallelism of structure as much as possible. Our architecture is implemented on the Xilinx ZCU104 evaluation board. The implemented design attains 98.1 GOPS/Joule and 32.77 GOPS/Joule for 8-bit and 16-bit data width respectively.\",\"PeriodicalId\":185782,\"journal\":{\"name\":\"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/UPCON56432.2022.9986483\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/UPCON56432.2022.9986483","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在过去的一年中，深度神经网络(DNN)因其高分类精度和多功能性而被广泛应用于计算机视觉应用。卷积神经网络(CNN)是深度神经网络中最流行的架构之一，广泛应用于图像、语音和视频识别。CNN的计算量大、内存需求大是其应用的瓶颈。现场可编程门阵列(fpga)被认为是部署低功耗cnn的合适硬件平台。本文重点研究了实现卷积积(矩阵-矩阵乘法)的硬件加速器的设计与实现。我们使用了两种优化技术来实现能源效率。首先，对卷积阶段的数据流进行重新调度，以减少不必要的片上存储器访问。此外，通过尽可能减少结构的内部平行度来提高效率。我们的架构是在赛灵思ZCU104评估板上实现的。实现的设计在8位和16位数据宽度下分别达到98.1 GOPS/焦耳和32.77 GOPS/焦耳。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Energy Efficient Hardware Implementation of 2-D Convolution for Convolutional Neural Network

Over the last year, Deep neural networks (DNN) have been significantly accepted for computer vision applications because of high classification accuracy and versatility. Convolutional Neural Network (CNN) is one of the most popular architectures of DNN which is widely adopted for image, speech and video recognition. Extensive computation and large memory requirement of CNN s poses the bottleneck on its application. Field Programmable Gate Arrays (FPGAs) are considered to be suitable hardware platforms for deployment of CNNs with low power requirements. This paper focus on the design and implementation of hardware accelerator to perform the convolution product (matrix-matrix multiplication. We have used two optimization techniques to achieve energy efficiency. First, dataflow of the convolution phase is rescheduled to reduce the undesired on-chip memory accesses. Further, efficiency is enhanced by reducing the internal parallelism of structure as much as possible. Our architecture is implemented on the Xilinx ZCU104 evaluation board. The implemented design attains 98.1 GOPS/Joule and 32.77 GOPS/Joule for 8-bit and 16-bit data width respectively.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE 9th Uttar Pradesh Section International Conference on Electrical, Electronics and Computer Engineering (UPCON)

自引率

0.00%

发文量