3D CNN Acceleration using Block Circulant Matrix in Frequency Domain

2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW) Pub Date : 2023-05-01 DOI:10.1109/CCGridW59191.2023.00059

R. Han, Huarong Xu, Peng Jiang, Xiongwei Jiang, Jiaming Qian

{"title":"3D CNN Acceleration using Block Circulant Matrix in Frequency Domain","authors":"R. Han, Huarong Xu, Peng Jiang, Xiongwei Jiang, Jiaming Qian","doi":"10.1109/CCGridW59191.2023.00059","DOIUrl":null,"url":null,"abstract":"Recently, 3D CNNs have proven to be outstanding in applications such as video analysis, 3D geometric data, and 3D medical image diagnosis. However, the forbidden storage and computational overhead of 3D CNNs hinder their deployment on edge devices. To tackle this challenge, we propose 3D FCirCNN, an algorithm-hardware co-design approach to realize efficient 3D CNNs acceleration. At the algorithm level, 3D FCirCNN applies block circulant matrix to compress 3D CNNs for the first time and further accelerates the computation with Fast Fourier Transform (FFT), significantly reducing the storage and computational overhead of 3D CNNs. Besides, to avoid the extra computation overhead caused by frequent spatial/frequency domain switching, we introduce several 3D CNN operators in the frequency domain, thus realizing the full frequency domain computation. At the hardware level, we design an FPGA-based dedicated hardware architecture to accelerate 3D FCirCNN. Experiments on the Xilinx ZCU102 demonstrate that with only an acceptable accuracy loss, 3D FCirCNN can achieve a performance of 2692.81 GOPS with a 3.349 GOPS/DSPs efficiency, which outperforms prior works significantly.","PeriodicalId":341115,"journal":{"name":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGridW59191.2023.00059","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Recently, 3D CNNs have proven to be outstanding in applications such as video analysis, 3D geometric data, and 3D medical image diagnosis. However, the forbidden storage and computational overhead of 3D CNNs hinder their deployment on edge devices. To tackle this challenge, we propose 3D FCirCNN, an algorithm-hardware co-design approach to realize efficient 3D CNNs acceleration. At the algorithm level, 3D FCirCNN applies block circulant matrix to compress 3D CNNs for the first time and further accelerates the computation with Fast Fourier Transform (FFT), significantly reducing the storage and computational overhead of 3D CNNs. Besides, to avoid the extra computation overhead caused by frequent spatial/frequency domain switching, we introduce several 3D CNN operators in the frequency domain, thus realizing the full frequency domain computation. At the hardware level, we design an FPGA-based dedicated hardware architecture to accelerate 3D FCirCNN. Experiments on the Xilinx ZCU102 demonstrate that with only an acceptable accuracy loss, 3D FCirCNN can achieve a performance of 2692.81 GOPS with a 3.349 GOPS/DSPs efficiency, which outperforms prior works significantly.

查看原文本刊更多论文

基于频域分块循环矩阵的三维CNN加速

近年来，三维cnn在视频分析、三维几何数据、三维医学图像诊断等方面的应用已经被证明是非常出色的。然而，3D cnn的禁止存储和计算开销阻碍了它们在边缘设备上的部署。为了解决这一挑战，我们提出了3D FCirCNN，一种算法-硬件协同设计方法来实现高效的3D cnn加速。在算法层面，3D FCirCNN首次采用块循环矩阵对3D cnn进行压缩，并通过快速傅里叶变换(Fast Fourier Transform, FFT)进一步加速计算，显著降低了3D cnn的存储和计算开销。此外，为了避免频繁的空间/频域切换带来的额外计算开销，我们在频域引入了多个3D CNN算子，从而实现了全频域计算。在硬件层面，我们设计了一个基于fpga的专用硬件架构来加速3D FCirCNN。在Xilinx ZCU102上的实验表明，在可接受的精度损失下，3D FCirCNN可以实现2692.81 GOPS的性能和3.349 GOPS/ dsp的效率，明显优于先前的工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing Workshops (CCGridW)

自引率

0.00%

发文量