{"title":"A deep learning convolution architecture for simple embedded applications","authors":"Chan Kim, Yong Cheol Peter Cho, Youngsu Kwon","doi":"10.1109/ICCE-Berlin.2017.8210595","DOIUrl":null,"url":null,"abstract":"A simple AXI based convolution architecture for deep learning is presented. Input feature maps and kernel weights are stored in P K∗K memory blocks and convolution is done from output feature map 0 to M-1, and inside a feature map, output is generated in raster scan order. Data from P input feature maps are summed in parallel during convolution. It is possible to provide P K∗K input feature map data, P K∗K weights and the bias for the input and output feature maps being processed by manipulating the read addresses and read data alignment. Dual buffers are used to perform convolution for output feature map while DMA write for previous final output feature map is in progress. Correct operation was verified by comparing RTL simulation and C program run results. This method provides over 2,000 speed-up compared to pure software method and with flow control between DMA and convolution, much less memory can be used. This architecture can be used for convolution acceleration for moderate deep learning applications on embedded systems.","PeriodicalId":355536,"journal":{"name":"2017 IEEE 7th International Conference on Consumer Electronics - Berlin (ICCE-Berlin)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 7th International Conference on Consumer Electronics - Berlin (ICCE-Berlin)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCE-Berlin.2017.8210595","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A simple AXI based convolution architecture for deep learning is presented. Input feature maps and kernel weights are stored in P K∗K memory blocks and convolution is done from output feature map 0 to M-1, and inside a feature map, output is generated in raster scan order. Data from P input feature maps are summed in parallel during convolution. It is possible to provide P K∗K input feature map data, P K∗K weights and the bias for the input and output feature maps being processed by manipulating the read addresses and read data alignment. Dual buffers are used to perform convolution for output feature map while DMA write for previous final output feature map is in progress. Correct operation was verified by comparing RTL simulation and C program run results. This method provides over 2,000 speed-up compared to pure software method and with flow control between DMA and convolution, much less memory can be used. This architecture can be used for convolution acceleration for moderate deep learning applications on embedded systems.