FPGA-based scalable and highly concurrent convolutional neural network acceleration

2021 IEEE International Conference on Power Electronics, Computer Applications (ICPECA) Pub Date : 2021-01-22 DOI:10.1109/ICPECA51329.2021.9362549

Hao Xiao, Kunhua Li, Mingcheng Zhu

{"title":"FPGA-based scalable and highly concurrent convolutional neural network acceleration","authors":"Hao Xiao, Kunhua Li, Mingcheng Zhu","doi":"10.1109/ICPECA51329.2021.9362549","DOIUrl":null,"url":null,"abstract":"This article proposes an efficient, low-latency, scalable, and low-error neural network acceleration architecture. Considering the performance requirements of high efficiency and low latency, the methods of multi-channel parallel computing between layers and pipeline design are adopted to accelerate the neural network. Then, based on Xilinx zynq-7000 FPGA, the acceleration strategy is realized, and the effect of calculating 28*28 handwritten images at 25.95us at a clock frequency of 200M is investigated. Further, the flexibility and scalability of the network is improved by adding a line buffer for variable image width and designing a mechanism for selectable convolution kernel size. Since the convolutional neural networks are based on floating-point operations, if the floating-point is converted to fixed-point when implemented on FPGA, there will not only be a loss of precision, but also introduce a tedious conversion work. Thus, our neural network uses 32-bit Floating point operations. Moreover, the task of handwritten digit recognition is performed on the MNIST data set, to experimentally evaluate our solution. Experiment results show that the neural network acceleration architecture proposed in this paper achieves better performance. Compare with the literature [4],[6], the calculation speed is significantly improved, and the calculation speed is increased by 101.6 times compared with the literature [4] Compared with the literature [6], there is a speed increase of 11.88 times.","PeriodicalId":119798,"journal":{"name":"2021 IEEE International Conference on Power Electronics, Computer Applications (ICPECA)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Power Electronics, Computer Applications (ICPECA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPECA51329.2021.9362549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

This article proposes an efficient, low-latency, scalable, and low-error neural network acceleration architecture. Considering the performance requirements of high efficiency and low latency, the methods of multi-channel parallel computing between layers and pipeline design are adopted to accelerate the neural network. Then, based on Xilinx zynq-7000 FPGA, the acceleration strategy is realized, and the effect of calculating 28*28 handwritten images at 25.95us at a clock frequency of 200M is investigated. Further, the flexibility and scalability of the network is improved by adding a line buffer for variable image width and designing a mechanism for selectable convolution kernel size. Since the convolutional neural networks are based on floating-point operations, if the floating-point is converted to fixed-point when implemented on FPGA, there will not only be a loss of precision, but also introduce a tedious conversion work. Thus, our neural network uses 32-bit Floating point operations. Moreover, the task of handwritten digit recognition is performed on the MNIST data set, to experimentally evaluate our solution. Experiment results show that the neural network acceleration architecture proposed in this paper achieves better performance. Compare with the literature [4],[6], the calculation speed is significantly improved, and the calculation speed is increased by 101.6 times compared with the literature [4] Compared with the literature [6], there is a speed increase of 11.88 times.

查看原文本刊更多论文

基于fpga的可扩展和高并发卷积神经网络加速

本文提出了一种高效、低延迟、可扩展、低误差的神经网络加速体系结构。考虑到高效、低时延的性能要求，采用层间多通道并行计算和管道设计等方法对神经网络进行加速。然后，基于Xilinx zynq-7000 FPGA实现了加速策略，并研究了在200M时钟频率下以25.95us速度计算28*28张手写图像的效果。此外，通过增加可变图像宽度的行缓冲区和设计可选择卷积核大小的机制，提高了网络的灵活性和可扩展性。由于卷积神经网络是基于浮点运算的，如果在FPGA上实现时将浮点转换为定点，不仅会造成精度的损失，而且还会引入繁琐的转换工作。因此，我们的神经网络使用32位浮点运算。此外，在MNIST数据集上执行手写数字识别任务，以实验评估我们的解决方案。实验结果表明，本文提出的神经网络加速体系结构取得了较好的性能。与文献[4]、[6]相比，计算速度明显提高，计算速度比文献[4]提高了101.6倍，与文献[6]相比，速度提高了11.88倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Conference on Power Electronics, Computer Applications (ICPECA)

自引率

0.00%

发文量