{"title":"Design and Implementation of Convolutional Neural Network Accelerator with Variable Layer-by-layer Debugging","authors":"Songpu Huang, Jingfei Jiang, Y. Dou, Liang Bai, Hao Wang, Buyue Qin","doi":"10.1145/3234804.3234806","DOIUrl":null,"url":null,"abstract":"Deep learning algorithms have complex network structures and numerous parameters, and are typical computation-intensive and data-intensive applications. Due to the large amount of data and the difference in accuracy, it is very difficult to realize the function and to adjust the accuracy of FPGA-based deep learning accelerator, which seriously affects the applicability of the accelerator. To this problem, this paper presents a convolutional neural network accelerator with variable layer-by-layer debugging. The accelerator framework consists of host computer, PCIE interface, DDR module, transmission control module, CNN module and variable layer-by-layer debugging module. The variable layer-by-layer debugging module consists of DRAM, FIFO, read DRAM counting module, write DRAM counting module and data alignment module. The debugging module can be assembled in any layer of the convolutional neural network, and effectively achieve layer-by-layer debugging in cooperation with host computer. Supported by this design framework, the paper implements VGG-S on a FPGA board based on the Xilinx XCKU115 chip, achieving an acceleration ratio of 24.78 compared to the CPU platform and a 14.6x performance-to-power ratio. The comprehensive results show that the hardware resource overhead with variable layer-by-layer debugging module is very small and does not affect the frequency of the convolutional network implementation. The experimental process verified the high efficiency of the implementation and debugging of the structure, which can be used as a debugging method for various pipelined convolutional networks in the future.","PeriodicalId":118446,"journal":{"name":"International Conference on Deep Learning Technologies","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Deep Learning Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3234804.3234806","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Deep learning algorithms have complex network structures and numerous parameters, and are typical computation-intensive and data-intensive applications. Due to the large amount of data and the difference in accuracy, it is very difficult to realize the function and to adjust the accuracy of FPGA-based deep learning accelerator, which seriously affects the applicability of the accelerator. To this problem, this paper presents a convolutional neural network accelerator with variable layer-by-layer debugging. The accelerator framework consists of host computer, PCIE interface, DDR module, transmission control module, CNN module and variable layer-by-layer debugging module. The variable layer-by-layer debugging module consists of DRAM, FIFO, read DRAM counting module, write DRAM counting module and data alignment module. The debugging module can be assembled in any layer of the convolutional neural network, and effectively achieve layer-by-layer debugging in cooperation with host computer. Supported by this design framework, the paper implements VGG-S on a FPGA board based on the Xilinx XCKU115 chip, achieving an acceleration ratio of 24.78 compared to the CPU platform and a 14.6x performance-to-power ratio. The comprehensive results show that the hardware resource overhead with variable layer-by-layer debugging module is very small and does not affect the frequency of the convolutional network implementation. The experimental process verified the high efficiency of the implementation and debugging of the structure, which can be used as a debugging method for various pipelined convolutional networks in the future.