R. Han, Huarong Xu, Peng Jiang, Xiongwei Jiang, Jiaming Qian
{"title":"3D CNN Acceleration using Block Circulant Matrix in Frequency Domain","authors":"R. Han, Huarong Xu, Peng Jiang, Xiongwei Jiang, Jiaming Qian","doi":"10.1109/CCGridW59191.2023.00059","DOIUrl":null,"url":null,"abstract":"Recently, 3D CNNs have proven to be outstanding in applications such as video analysis, 3D geometric data, and 3D medical image diagnosis. However, the forbidden storage and computational overhead of 3D CNNs hinder their deployment on edge devices. To tackle this challenge, we propose 3D FCirCNN, an algorithm-hardware co-design approach to realize efficient 3D CNNs acceleration. At the algorithm level, 3D FCirCNN applies block circulant matrix to compress 3D CNNs for the first time and further accelerates the computation with Fast Fourier Transform (FFT), significantly reducing the storage and computational overhead of 3D CNNs. Besides, to avoid the extra computation overhead caused by frequent spatial/frequency domain switching, we introduce several 3D CNN operators in the frequency domain, thus realizing the full frequency domain computation. At the hardware level, we design an FPGA-based dedicated hardware architecture to accelerate 3D FCirCNN. Experiments on the Xilinx ZCU102 demonstrate that with only an acceptable accuracy loss, 3D FCirCNN can achieve a performance of 2692.81 GOPS with a 3.349 GOPS/DSPs efficiency, which outperforms prior works significantly.","PeriodicalId":341115,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGridW59191.2023.00059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Recently, 3D CNNs have proven to be outstanding in applications such as video analysis, 3D geometric data, and 3D medical image diagnosis. However, the forbidden storage and computational overhead of 3D CNNs hinder their deployment on edge devices. To tackle this challenge, we propose 3D FCirCNN, an algorithm-hardware co-design approach to realize efficient 3D CNNs acceleration. At the algorithm level, 3D FCirCNN applies block circulant matrix to compress 3D CNNs for the first time and further accelerates the computation with Fast Fourier Transform (FFT), significantly reducing the storage and computational overhead of 3D CNNs. Besides, to avoid the extra computation overhead caused by frequent spatial/frequency domain switching, we introduce several 3D CNN operators in the frequency domain, thus realizing the full frequency domain computation. At the hardware level, we design an FPGA-based dedicated hardware architecture to accelerate 3D FCirCNN. Experiments on the Xilinx ZCU102 demonstrate that with only an acceptable accuracy loss, 3D FCirCNN can achieve a performance of 2692.81 GOPS with a 3.349 GOPS/DSPs efficiency, which outperforms prior works significantly.