Jingbo Gao, Yu Qian, Yihan Hu, Xitian Fan, W. Luk, Wei Cao, Lingli Wang
{"title":"LETA: A lightweight exchangeable-track accelerator for efficientnet based on FPGA","authors":"Jingbo Gao, Yu Qian, Yihan Hu, Xitian Fan, W. Luk, Wei Cao, Lingli Wang","doi":"10.1109/ICFPT52863.2021.9609919","DOIUrl":null,"url":null,"abstract":"Lightweight convolutional neural networks (CNNs) have become increasingly popular due to their lower computational complexity and fewer memory accesses with equivalent accuracy compared to previous CNN models. However, the newly proposed networks bring new challenges to efficient hardware design, such as, in EfficientNet, depthwise convolution, squeeze-and-excitation (SE) module, and swish/sigmoid functions. Although individual engine architecture could achieve a high computing efficiency for the standard convolution or the depth-wise convolution, it is still not efficient for EfficientNet because the workload imbalance between two types of convolutional engines causes inevitable idling. To overcome this problem, we present a lightweight reconfigurable computational kernel based on FPGA with an exchangeable-track datapath scheme. In addition, a low-accuracy-loss function replacement strategy is proposed for swish/sigmoid functions. Furthermore, the low-cost hardware architecture to implement the replaced functions is designed. The proposed accelerator (LETA) can implement EfficientNet on Xilinx XCVU37P with a 300 MHz system clock and a 600 MHz kernel clock. The linear growth of resource usage in the 4-kernel implementation in 1 super logic region (SLR) with the same clock frequencies justifies the scalability of LETA. The experimental results show that LETA can achieve 2× throughput/DSP compared to the latest FPGA-based accelerator with 1.6% (0.7%) top-1 (top-5) accuracy loss on EfficientNet-B3.","PeriodicalId":376220,"journal":{"name":"2021 International Conference on Field-Programmable Technology (ICFPT)","volume":"603 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 International Conference on Field-Programmable Technology (ICFPT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICFPT52863.2021.9609919","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
Lightweight convolutional neural networks (CNNs) have become increasingly popular due to their lower computational complexity and fewer memory accesses with equivalent accuracy compared to previous CNN models. However, the newly proposed networks bring new challenges to efficient hardware design, such as, in EfficientNet, depthwise convolution, squeeze-and-excitation (SE) module, and swish/sigmoid functions. Although individual engine architecture could achieve a high computing efficiency for the standard convolution or the depth-wise convolution, it is still not efficient for EfficientNet because the workload imbalance between two types of convolutional engines causes inevitable idling. To overcome this problem, we present a lightweight reconfigurable computational kernel based on FPGA with an exchangeable-track datapath scheme. In addition, a low-accuracy-loss function replacement strategy is proposed for swish/sigmoid functions. Furthermore, the low-cost hardware architecture to implement the replaced functions is designed. The proposed accelerator (LETA) can implement EfficientNet on Xilinx XCVU37P with a 300 MHz system clock and a 600 MHz kernel clock. The linear growth of resource usage in the 4-kernel implementation in 1 super logic region (SLR) with the same clock frequencies justifies the scalability of LETA. The experimental results show that LETA can achieve 2× throughput/DSP compared to the latest FPGA-based accelerator with 1.6% (0.7%) top-1 (top-5) accuracy loss on EfficientNet-B3.