FPGA-based scalable and highly concurrent convolutional neural network acceleration

Hao Xiao, Kunhua Li, Mingcheng Zhu
{"title":"FPGA-based scalable and highly concurrent convolutional neural network acceleration","authors":"Hao Xiao, Kunhua Li, Mingcheng Zhu","doi":"10.1109/ICPECA51329.2021.9362549","DOIUrl":null,"url":null,"abstract":"This article proposes an efficient, low-latency, scalable, and low-error neural network acceleration architecture. Considering the performance requirements of high efficiency and low latency, the methods of multi-channel parallel computing between layers and pipeline design are adopted to accelerate the neural network. Then, based on Xilinx zynq-7000 FPGA, the acceleration strategy is realized, and the effect of calculating 28*28 handwritten images at 25.95us at a clock frequency of 200M is investigated. Further, the flexibility and scalability of the network is improved by adding a line buffer for variable image width and designing a mechanism for selectable convolution kernel size. Since the convolutional neural networks are based on floating-point operations, if the floating-point is converted to fixed-point when implemented on FPGA, there will not only be a loss of precision, but also introduce a tedious conversion work. Thus, our neural network uses 32-bit Floating point operations. Moreover, the task of handwritten digit recognition is performed on the MNIST data set, to experimentally evaluate our solution. Experiment results show that the neural network acceleration architecture proposed in this paper achieves better performance. Compare with the literature [4],[6], the calculation speed is significantly improved, and the calculation speed is increased by 101.6 times compared with the literature [4] Compared with the literature [6], there is a speed increase of 11.88 times.","PeriodicalId":119798,"journal":{"name":"2021 IEEE International Conference on Power Electronics, Computer Applications (ICPECA)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Power Electronics, Computer Applications (ICPECA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPECA51329.2021.9362549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

This article proposes an efficient, low-latency, scalable, and low-error neural network acceleration architecture. Considering the performance requirements of high efficiency and low latency, the methods of multi-channel parallel computing between layers and pipeline design are adopted to accelerate the neural network. Then, based on Xilinx zynq-7000 FPGA, the acceleration strategy is realized, and the effect of calculating 28*28 handwritten images at 25.95us at a clock frequency of 200M is investigated. Further, the flexibility and scalability of the network is improved by adding a line buffer for variable image width and designing a mechanism for selectable convolution kernel size. Since the convolutional neural networks are based on floating-point operations, if the floating-point is converted to fixed-point when implemented on FPGA, there will not only be a loss of precision, but also introduce a tedious conversion work. Thus, our neural network uses 32-bit Floating point operations. Moreover, the task of handwritten digit recognition is performed on the MNIST data set, to experimentally evaluate our solution. Experiment results show that the neural network acceleration architecture proposed in this paper achieves better performance. Compare with the literature [4],[6], the calculation speed is significantly improved, and the calculation speed is increased by 101.6 times compared with the literature [4] Compared with the literature [6], there is a speed increase of 11.88 times.
基于fpga的可扩展和高并发卷积神经网络加速
本文提出了一种高效、低延迟、可扩展、低误差的神经网络加速体系结构。考虑到高效、低时延的性能要求,采用层间多通道并行计算和管道设计等方法对神经网络进行加速。然后,基于Xilinx zynq-7000 FPGA实现了加速策略,并研究了在200M时钟频率下以25.95us速度计算28*28张手写图像的效果。此外,通过增加可变图像宽度的行缓冲区和设计可选择卷积核大小的机制,提高了网络的灵活性和可扩展性。由于卷积神经网络是基于浮点运算的,如果在FPGA上实现时将浮点转换为定点,不仅会造成精度的损失,而且还会引入繁琐的转换工作。因此,我们的神经网络使用32位浮点运算。此外,在MNIST数据集上执行手写数字识别任务,以实验评估我们的解决方案。实验结果表明,本文提出的神经网络加速体系结构取得了较好的性能。与文献[4]、[6]相比,计算速度明显提高,计算速度比文献[4]提高了101.6倍,与文献[6]相比,速度提高了11.88倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信