Design and Implementation of Convolutional Neural Network Accelerator with Variable Layer-by-layer Debugging

International Conference on Deep Learning Technologies Pub Date : 2018-06-27 DOI:10.1145/3234804.3234806

Songpu Huang, Jingfei Jiang, Y. Dou, Liang Bai, Hao Wang, Buyue Qin

{"title":"Design and Implementation of Convolutional Neural Network Accelerator with Variable Layer-by-layer Debugging","authors":"Songpu Huang, Jingfei Jiang, Y. Dou, Liang Bai, Hao Wang, Buyue Qin","doi":"10.1145/3234804.3234806","DOIUrl":null,"url":null,"abstract":"Deep learning algorithms have complex network structures and numerous parameters, and are typical computation-intensive and data-intensive applications. Due to the large amount of data and the difference in accuracy, it is very difficult to realize the function and to adjust the accuracy of FPGA-based deep learning accelerator, which seriously affects the applicability of the accelerator. To this problem, this paper presents a convolutional neural network accelerator with variable layer-by-layer debugging. The accelerator framework consists of host computer, PCIE interface, DDR module, transmission control module, CNN module and variable layer-by-layer debugging module. The variable layer-by-layer debugging module consists of DRAM, FIFO, read DRAM counting module, write DRAM counting module and data alignment module. The debugging module can be assembled in any layer of the convolutional neural network, and effectively achieve layer-by-layer debugging in cooperation with host computer. Supported by this design framework, the paper implements VGG-S on a FPGA board based on the Xilinx XCKU115 chip, achieving an acceleration ratio of 24.78 compared to the CPU platform and a 14.6x performance-to-power ratio. The comprehensive results show that the hardware resource overhead with variable layer-by-layer debugging module is very small and does not affect the frequency of the convolutional network implementation. The experimental process verified the high efficiency of the implementation and debugging of the structure, which can be used as a debugging method for various pipelined convolutional networks in the future.","PeriodicalId":118446,"journal":{"name":"International Conference on Deep Learning Technologies","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Deep Learning Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3234804.3234806","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Deep learning algorithms have complex network structures and numerous parameters, and are typical computation-intensive and data-intensive applications. Due to the large amount of data and the difference in accuracy, it is very difficult to realize the function and to adjust the accuracy of FPGA-based deep learning accelerator, which seriously affects the applicability of the accelerator. To this problem, this paper presents a convolutional neural network accelerator with variable layer-by-layer debugging. The accelerator framework consists of host computer, PCIE interface, DDR module, transmission control module, CNN module and variable layer-by-layer debugging module. The variable layer-by-layer debugging module consists of DRAM, FIFO, read DRAM counting module, write DRAM counting module and data alignment module. The debugging module can be assembled in any layer of the convolutional neural network, and effectively achieve layer-by-layer debugging in cooperation with host computer. Supported by this design framework, the paper implements VGG-S on a FPGA board based on the Xilinx XCKU115 chip, achieving an acceleration ratio of 24.78 compared to the CPU platform and a 14.6x performance-to-power ratio. The comprehensive results show that the hardware resource overhead with variable layer-by-layer debugging module is very small and does not affect the frequency of the convolutional network implementation. The experimental process verified the high efficiency of the implementation and debugging of the structure, which can be used as a debugging method for various pipelined convolutional networks in the future.

查看原文本刊更多论文

可变逐层调试的卷积神经网络加速器的设计与实现

深度学习算法网络结构复杂，参数众多，是典型的计算密集型和数据密集型应用。针对这一问题，本文提出了一种可变逐层调试的卷积神经网络加速器。加速器框架由上位机、PCIE接口、DDR模块、传输控制模块、CNN模块和可变逐层调试模块组成。可变逐层调试模块由DRAM、FIFO、读DRAM计数模块、写DRAM计数模块和数据对齐模块组成。调试模块可以装配在卷积神经网络的任何一层，与上位机配合有效地实现逐层调试。在该设计框架的支持下，本文在基于Xilinx XCKU115芯片的FPGA板上实现了VGG-S，与CPU平台相比加速比为24.78，性能功耗比为14.6倍。综合结果表明，采用可变逐层调试模块的硬件资源开销很小，且不影响卷积网络实现的频率。实验过程验证了该结构的高效实现和调试，可作为未来各种流水线卷积网络的调试方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Conference on Deep Learning Technologies

自引率

0.00%

发文量