{"title":"An Efficient FPGA Accelerator Optimized for High Throughput Sparse CNN Inference","authors":"Jiayu Wen, Yufei Ma, Zhongfeng Wang","doi":"10.1109/APCCAS50809.2020.9301696","DOIUrl":null,"url":null,"abstract":"Pruning techniques can compress the CNN models by making the insignificant weights to be zeros to release the tremendous workload in large-scale CNNs. However, for hardware architecture, to efficiently load and operate on the nonzero data with high parallelism is a great challenge due to the random location of pruned weights. To address this issue, a sparsity aware CNN accelerator is proposed in this work to process the irregularly pruned CNN models. A candidate pool architecture is designed to only pick the randomly needed activations chosen by nonzero weights. It is set as a three-dimensional structure to relieve the problem of workload imbalance caused by random nonzero weight locations and high parallelism. Besides, a dedicated indexing method is designed to cooperate with the candidate pool architecture to accomplish the whole sparse dataflow. The proposed sparsity aware CNN accelerator is demonstrated on Intel Arria 10 FPGA for multiple popular CNN models that achieves up to 89.7% throughput improvement compared to the baseline design.","PeriodicalId":127075,"journal":{"name":"2020 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APCCAS50809.2020.9301696","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6
Abstract
Pruning techniques can compress the CNN models by making the insignificant weights to be zeros to release the tremendous workload in large-scale CNNs. However, for hardware architecture, to efficiently load and operate on the nonzero data with high parallelism is a great challenge due to the random location of pruned weights. To address this issue, a sparsity aware CNN accelerator is proposed in this work to process the irregularly pruned CNN models. A candidate pool architecture is designed to only pick the randomly needed activations chosen by nonzero weights. It is set as a three-dimensional structure to relieve the problem of workload imbalance caused by random nonzero weight locations and high parallelism. Besides, a dedicated indexing method is designed to cooperate with the candidate pool architecture to accomplish the whole sparse dataflow. The proposed sparsity aware CNN accelerator is demonstrated on Intel Arria 10 FPGA for multiple popular CNN models that achieves up to 89.7% throughput improvement compared to the baseline design.