Design and Implementation of Convolutional Neural Network Accelerator with Variable Layer-by-layer Debugging

Songpu Huang, Jingfei Jiang, Y. Dou, Liang Bai, Hao Wang, Buyue Qin
{"title":"Design and Implementation of Convolutional Neural Network Accelerator with Variable Layer-by-layer Debugging","authors":"Songpu Huang, Jingfei Jiang, Y. Dou, Liang Bai, Hao Wang, Buyue Qin","doi":"10.1145/3234804.3234806","DOIUrl":null,"url":null,"abstract":"Deep learning algorithms have complex network structures and numerous parameters, and are typical computation-intensive and data-intensive applications. Due to the large amount of data and the difference in accuracy, it is very difficult to realize the function and to adjust the accuracy of FPGA-based deep learning accelerator, which seriously affects the applicability of the accelerator. To this problem, this paper presents a convolutional neural network accelerator with variable layer-by-layer debugging. The accelerator framework consists of host computer, PCIE interface, DDR module, transmission control module, CNN module and variable layer-by-layer debugging module. The variable layer-by-layer debugging module consists of DRAM, FIFO, read DRAM counting module, write DRAM counting module and data alignment module. The debugging module can be assembled in any layer of the convolutional neural network, and effectively achieve layer-by-layer debugging in cooperation with host computer. Supported by this design framework, the paper implements VGG-S on a FPGA board based on the Xilinx XCKU115 chip, achieving an acceleration ratio of 24.78 compared to the CPU platform and a 14.6x performance-to-power ratio. The comprehensive results show that the hardware resource overhead with variable layer-by-layer debugging module is very small and does not affect the frequency of the convolutional network implementation. The experimental process verified the high efficiency of the implementation and debugging of the structure, which can be used as a debugging method for various pipelined convolutional networks in the future.","PeriodicalId":118446,"journal":{"name":"International Conference on Deep Learning Technologies","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Deep Learning Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3234804.3234806","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Deep learning algorithms have complex network structures and numerous parameters, and are typical computation-intensive and data-intensive applications. Due to the large amount of data and the difference in accuracy, it is very difficult to realize the function and to adjust the accuracy of FPGA-based deep learning accelerator, which seriously affects the applicability of the accelerator. To this problem, this paper presents a convolutional neural network accelerator with variable layer-by-layer debugging. The accelerator framework consists of host computer, PCIE interface, DDR module, transmission control module, CNN module and variable layer-by-layer debugging module. The variable layer-by-layer debugging module consists of DRAM, FIFO, read DRAM counting module, write DRAM counting module and data alignment module. The debugging module can be assembled in any layer of the convolutional neural network, and effectively achieve layer-by-layer debugging in cooperation with host computer. Supported by this design framework, the paper implements VGG-S on a FPGA board based on the Xilinx XCKU115 chip, achieving an acceleration ratio of 24.78 compared to the CPU platform and a 14.6x performance-to-power ratio. The comprehensive results show that the hardware resource overhead with variable layer-by-layer debugging module is very small and does not affect the frequency of the convolutional network implementation. The experimental process verified the high efficiency of the implementation and debugging of the structure, which can be used as a debugging method for various pipelined convolutional networks in the future.
可变逐层调试的卷积神经网络加速器的设计与实现
深度学习算法网络结构复杂,参数众多,是典型的计算密集型和数据密集型应用。针对这一问题,本文提出了一种可变逐层调试的卷积神经网络加速器。加速器框架由上位机、PCIE接口、DDR模块、传输控制模块、CNN模块和可变逐层调试模块组成。可变逐层调试模块由DRAM、FIFO、读DRAM计数模块、写DRAM计数模块和数据对齐模块组成。调试模块可以装配在卷积神经网络的任何一层,与上位机配合有效地实现逐层调试。在该设计框架的支持下,本文在基于Xilinx XCKU115芯片的FPGA板上实现了VGG-S,与CPU平台相比加速比为24.78,性能功耗比为14.6倍。综合结果表明,采用可变逐层调试模块的硬件资源开销很小,且不影响卷积网络实现的频率。实验过程验证了该结构的高效实现和调试,可作为未来各种流水线卷积网络的调试方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信