{"title":"基于fpga的可扩展和高并发卷积神经网络加速","authors":"Hao Xiao, Kunhua Li, Mingcheng Zhu","doi":"10.1109/ICPECA51329.2021.9362549","DOIUrl":null,"url":null,"abstract":"This article proposes an efficient, low-latency, scalable, and low-error neural network acceleration architecture. Considering the performance requirements of high efficiency and low latency, the methods of multi-channel parallel computing between layers and pipeline design are adopted to accelerate the neural network. Then, based on Xilinx zynq-7000 FPGA, the acceleration strategy is realized, and the effect of calculating 28*28 handwritten images at 25.95us at a clock frequency of 200M is investigated. Further, the flexibility and scalability of the network is improved by adding a line buffer for variable image width and designing a mechanism for selectable convolution kernel size. Since the convolutional neural networks are based on floating-point operations, if the floating-point is converted to fixed-point when implemented on FPGA, there will not only be a loss of precision, but also introduce a tedious conversion work. Thus, our neural network uses 32-bit Floating point operations. Moreover, the task of handwritten digit recognition is performed on the MNIST data set, to experimentally evaluate our solution. Experiment results show that the neural network acceleration architecture proposed in this paper achieves better performance. Compare with the literature [4],[6], the calculation speed is significantly improved, and the calculation speed is increased by 101.6 times compared with the literature [4] Compared with the literature [6], there is a speed increase of 11.88 times.","PeriodicalId":119798,"journal":{"name":"2021 IEEE International Conference on Power Electronics, Computer Applications (ICPECA)","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"FPGA-based scalable and highly concurrent convolutional neural network acceleration\",\"authors\":\"Hao Xiao, Kunhua Li, Mingcheng Zhu\",\"doi\":\"10.1109/ICPECA51329.2021.9362549\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This article proposes an efficient, low-latency, scalable, and low-error neural network acceleration architecture. Considering the performance requirements of high efficiency and low latency, the methods of multi-channel parallel computing between layers and pipeline design are adopted to accelerate the neural network. Then, based on Xilinx zynq-7000 FPGA, the acceleration strategy is realized, and the effect of calculating 28*28 handwritten images at 25.95us at a clock frequency of 200M is investigated. Further, the flexibility and scalability of the network is improved by adding a line buffer for variable image width and designing a mechanism for selectable convolution kernel size. Since the convolutional neural networks are based on floating-point operations, if the floating-point is converted to fixed-point when implemented on FPGA, there will not only be a loss of precision, but also introduce a tedious conversion work. Thus, our neural network uses 32-bit Floating point operations. Moreover, the task of handwritten digit recognition is performed on the MNIST data set, to experimentally evaluate our solution. Experiment results show that the neural network acceleration architecture proposed in this paper achieves better performance. Compare with the literature [4],[6], the calculation speed is significantly improved, and the calculation speed is increased by 101.6 times compared with the literature [4] Compared with the literature [6], there is a speed increase of 11.88 times.\",\"PeriodicalId\":119798,\"journal\":{\"name\":\"2021 IEEE International Conference on Power Electronics, Computer Applications (ICPECA)\",\"volume\":\"47 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Power Electronics, Computer Applications (ICPECA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPECA51329.2021.9362549\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Power Electronics, Computer Applications (ICPECA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPECA51329.2021.9362549","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
FPGA-based scalable and highly concurrent convolutional neural network acceleration
This article proposes an efficient, low-latency, scalable, and low-error neural network acceleration architecture. Considering the performance requirements of high efficiency and low latency, the methods of multi-channel parallel computing between layers and pipeline design are adopted to accelerate the neural network. Then, based on Xilinx zynq-7000 FPGA, the acceleration strategy is realized, and the effect of calculating 28*28 handwritten images at 25.95us at a clock frequency of 200M is investigated. Further, the flexibility and scalability of the network is improved by adding a line buffer for variable image width and designing a mechanism for selectable convolution kernel size. Since the convolutional neural networks are based on floating-point operations, if the floating-point is converted to fixed-point when implemented on FPGA, there will not only be a loss of precision, but also introduce a tedious conversion work. Thus, our neural network uses 32-bit Floating point operations. Moreover, the task of handwritten digit recognition is performed on the MNIST data set, to experimentally evaluate our solution. Experiment results show that the neural network acceleration architecture proposed in this paper achieves better performance. Compare with the literature [4],[6], the calculation speed is significantly improved, and the calculation speed is increased by 101.6 times compared with the literature [4] Compared with the literature [6], there is a speed increase of 11.88 times.