Xinyuan Qu, Zhihong Huang, Ning Mao, Yu Xu, Gang Cai, Zhen Fang
{"title":"A Grain-Adaptive Computing Structure for FPGA CNN Acceleration","authors":"Xinyuan Qu, Zhihong Huang, Ning Mao, Yu Xu, Gang Cai, Zhen Fang","doi":"10.1109/ASICON47005.2019.8983480","DOIUrl":null,"url":null,"abstract":"In recent years, because of its superior performance and outstanding accuracy, convolutional neural networks (CNNs) are widely used in high-tech applications such as image classification and speech recognition. But it is more and more difficult to implement CNN in hardware platform due to the scale of CNN is increasing rapidly. FPGA attracts more attention compared with other processors for its excellent balance of flexibility and efficiency. There are many FPGA-based CNN accelerators proposed by previous work. However, in previous work the computing resource (especially DSP) is not fully utilized, either explicitly or covertly, which affects the CNN accelerator's overall performance seriously. In this work, we propose a new formula that provides a more accurate and comprehensive analysis to evaluate computing resource utilization, which can provide guidance for CNN accelerator design optimization. Then we propose a grain-adaptive computing structure for FPGA-based CNN acceleration, which can change flexibly to suit to and optimally utilize the available DSP resource. Due to the improvement of DSP utilization, we can achieve a more satisfactory result for both overall throughput performance and power efficiency. This architecture is implemented on Xilinx xcku115 based on AlexNet, the frequency is 150MHz and the peak power consumption is 30.05W. The overall performance is 1292.40 GOPS, 43.01 GOP/s/W, resulting in 2.28X and 1.94X, 9.44X and 3.02X improvement compared to previous work [6], [9] correspondingly.","PeriodicalId":319342,"journal":{"name":"2019 IEEE 13th International Conference on ASIC (ASICON)","volume":"150 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 13th International Conference on ASIC (ASICON)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASICON47005.2019.8983480","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
In recent years, because of its superior performance and outstanding accuracy, convolutional neural networks (CNNs) are widely used in high-tech applications such as image classification and speech recognition. But it is more and more difficult to implement CNN in hardware platform due to the scale of CNN is increasing rapidly. FPGA attracts more attention compared with other processors for its excellent balance of flexibility and efficiency. There are many FPGA-based CNN accelerators proposed by previous work. However, in previous work the computing resource (especially DSP) is not fully utilized, either explicitly or covertly, which affects the CNN accelerator's overall performance seriously. In this work, we propose a new formula that provides a more accurate and comprehensive analysis to evaluate computing resource utilization, which can provide guidance for CNN accelerator design optimization. Then we propose a grain-adaptive computing structure for FPGA-based CNN acceleration, which can change flexibly to suit to and optimally utilize the available DSP resource. Due to the improvement of DSP utilization, we can achieve a more satisfactory result for both overall throughput performance and power efficiency. This architecture is implemented on Xilinx xcku115 based on AlexNet, the frequency is 150MHz and the peak power consumption is 30.05W. The overall performance is 1292.40 GOPS, 43.01 GOP/s/W, resulting in 2.28X and 1.94X, 9.44X and 3.02X improvement compared to previous work [6], [9] correspondingly.