Huafeng Ye, Huipeng Deng, Jian Wang, Mingyu Wang, Zhiyi Yu
{"title":"3D-NWA: A Nested-Winograd Accelerator for 3D CNNs","authors":"Huafeng Ye, Huipeng Deng, Jian Wang, Mingyu Wang, Zhiyi Yu","doi":"10.1109/ICTA56932.2022.9963033","DOIUrl":null,"url":null,"abstract":"3D Convolutional neural networks (3D CNNs) perform better in some scenarios, such as video understanding and 3D medical image diagnosis. With the increase in the dimension and size of the convolution kernel, CNN's computational complexity and implementation difficulty increase severely. Winograd transformation can significantly reduce the number of multiplications in convolution operations. However, large convolution filters will bring numerical instability. In this article, we presented a novel method called 3D nested Winograd algorithm to address the problem. Compared with the state-of-art OLA-Winograd algorithm, the proposed algorithm reduces the multiplications by 1.72 to 5.83× for computing 5 × 5 × 5 to 9 × 9 × 9 convolutions. Finally, we demonstrate the efficiency of 3D-NWA on the FPGA platform (Xilinx VCU118) and achieve highest DSP efficiency up to 4.67× compared with the state-of-art accelerators.","PeriodicalId":325602,"journal":{"name":"2022 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Integrated Circuits, Technologies and Applications (ICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTA56932.2022.9963033","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
3D Convolutional neural networks (3D CNNs) perform better in some scenarios, such as video understanding and 3D medical image diagnosis. With the increase in the dimension and size of the convolution kernel, CNN's computational complexity and implementation difficulty increase severely. Winograd transformation can significantly reduce the number of multiplications in convolution operations. However, large convolution filters will bring numerical instability. In this article, we presented a novel method called 3D nested Winograd algorithm to address the problem. Compared with the state-of-art OLA-Winograd algorithm, the proposed algorithm reduces the multiplications by 1.72 to 5.83× for computing 5 × 5 × 5 to 9 × 9 × 9 convolutions. Finally, we demonstrate the efficiency of 3D-NWA on the FPGA platform (Xilinx VCU118) and achieve highest DSP efficiency up to 4.67× compared with the state-of-art accelerators.