{"title":"Efficient FPGA design for Convolutions in CNN based on FFT-pruning","authors":"Liulu He, Xiaoru Xie, Jun Lin, Zhongfeng Wang","doi":"10.1109/APCCAS50809.2020.9301653","DOIUrl":null,"url":null,"abstract":"Fast algorithms of convolution, such as Winograd and fast Fourier transformation (FFT), have been widely used in many FPGA-based CNN accelerators to reducing the complexity of multiplication. The core idea for those fast algorithms is reducing the number of multiplication at the cost of more additions. However, increased additions take up a significant portion in the whole LUT resources in many cases, which forms a new bottleneck in the corresponding hardware design. In this paper, we theoretically analyze the relationship between the reduced multiplications and the increased additions, and propose an reduced complexity fast FFT convolution algorithm by intelligently employing the FFT-pruning method to remove redundant additions. Compared with the state-of-the-art algorithm, our algorithm can reduce more than 50% of additions. Moreover, the proposed algorithm has better numerical accuracy and comparable multiplication complexity compared to the most efficient Winograd algorithm. Additionally, an efficient reconfigurable architecture of the proposed algorithm is also developed to accelerate convolutional layers with various kernel sizes. Implemented with Xilinx ZC706, the proposed architecture achieves 200.6 GOPS on convolutional layers of ResNet-50 with 61% higher resources efficiency with respect to LUT consumption compared to prior arts.","PeriodicalId":127075,"journal":{"name":"2020 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APCCAS50809.2020.9301653","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Fast algorithms of convolution, such as Winograd and fast Fourier transformation (FFT), have been widely used in many FPGA-based CNN accelerators to reducing the complexity of multiplication. The core idea for those fast algorithms is reducing the number of multiplication at the cost of more additions. However, increased additions take up a significant portion in the whole LUT resources in many cases, which forms a new bottleneck in the corresponding hardware design. In this paper, we theoretically analyze the relationship between the reduced multiplications and the increased additions, and propose an reduced complexity fast FFT convolution algorithm by intelligently employing the FFT-pruning method to remove redundant additions. Compared with the state-of-the-art algorithm, our algorithm can reduce more than 50% of additions. Moreover, the proposed algorithm has better numerical accuracy and comparable multiplication complexity compared to the most efficient Winograd algorithm. Additionally, an efficient reconfigurable architecture of the proposed algorithm is also developed to accelerate convolutional layers with various kernel sizes. Implemented with Xilinx ZC706, the proposed architecture achieves 200.6 GOPS on convolutional layers of ResNet-50 with 61% higher resources efficiency with respect to LUT consumption compared to prior arts.