{"title":"Hardware Implementation of Reconfigurable Separable Convolution","authors":"L. Rao, Bin Zhang, Jizhong Zhao","doi":"10.1109/ISVLSI.2018.00051","DOIUrl":null,"url":null,"abstract":"Convolution operations occupy large amounts of computation resource in convolutional neural networks (CNNs). Separable convolution can greatly reduce computational complexity. Unfortunately, most trained kernels in CNNs are not separable. In this paper, least squares approach is applied to decompose a non-separable 2D kernel into two 1D kernels. A reconfigurable convolutional architecture is proposed to convert a 2D convolution into 1D convolution in convolutional layers. Moreover, a denoising CNN is mapped to the proposed convolution architecture. Experimental results show that the hardware architecture can restore a 1280 720 image in 0.83s, which achieves an 8.4 speed-up over GPU implementation. Verification experiments demonstrate that our approach and hardware architecture can drastically reduce the computational complexity in convolution operations without sacrificing the performance.","PeriodicalId":114330,"journal":{"name":"2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"15 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI.2018.00051","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Convolution operations occupy large amounts of computation resource in convolutional neural networks (CNNs). Separable convolution can greatly reduce computational complexity. Unfortunately, most trained kernels in CNNs are not separable. In this paper, least squares approach is applied to decompose a non-separable 2D kernel into two 1D kernels. A reconfigurable convolutional architecture is proposed to convert a 2D convolution into 1D convolution in convolutional layers. Moreover, a denoising CNN is mapped to the proposed convolution architecture. Experimental results show that the hardware architecture can restore a 1280 720 image in 0.83s, which achieves an 8.4 speed-up over GPU implementation. Verification experiments demonstrate that our approach and hardware architecture can drastically reduce the computational complexity in convolution operations without sacrificing the performance.