Yishuo Meng;Jianfei Wang;Siwei Xiang;Jia Hou;Zhijie Lin;Kuizhi Mei;Chen Yang
{"title":"基于稀疏设计方案的可重构CNN加速器卷积引擎设计思考","authors":"Yishuo Meng;Jianfei Wang;Siwei Xiang;Jia Hou;Zhijie Lin;Kuizhi Mei;Chen Yang","doi":"10.1109/TCSI.2025.3554332","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNNs) are evolving as they are applied to more diverse environments and more difficult challenges. The evolving induces various convolution modes (e.g., <inline-formula> <tex-math>$1\\times 1$ </tex-math></inline-formula> convolution, 2-stride convolution and rectangle convolution) in current CNNs and makes it difficult for the hardware accelerators to efficiently support such various convolution modes. In this paper, it is found that an important difference of these convolution modes is the computation density. Therefore, the above convolution modes are regarded as structured sparse and claims that sparse-based design methodology can be applied for the implementation of the reconfigurable CNN accelerator. Subsequently, two critical architectural parameters, including input tile size and convolution engine (CE) scale, are evaluated based on Standard deviation of calculations (SDC), unsupported convolution mode (UCM) and unsuitable I FM size (UIS), DSP utilization ratio (DUR) as well as hardware resource overhead (HRO), respectively. With the aid of the optimal parameters, a high-parallelism and flexible CE array and a high-performance and reconfigurable CNN architecture are designed. The accelerator was implemented on a Xilinx VC709 FPGA and ran at a clock frequency of 300 MHz, achieving 921.60 to 1382.40 GOPS while supporting various convolution modes. Compared with previous dense-/sparse-based works, the proposed accelerator can realize <inline-formula> <tex-math>$1.35\\times $ </tex-math></inline-formula> to <inline-formula> <tex-math>$10.77\\times $ </tex-math></inline-formula> improvements on performance and <inline-formula> <tex-math>$1.22\\times $ </tex-math></inline-formula> to <inline-formula> <tex-math>$2.84\\times $ </tex-math></inline-formula> improvements on DSP efficiency while deploying VGG16.","PeriodicalId":13039,"journal":{"name":"IEEE Transactions on Circuits and Systems I: Regular Papers","volume":"72 8","pages":"3983-3996"},"PeriodicalIF":5.2000,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Rethinking the Designing of Convolution Engine for Reconfigurable CNN Accelerator Using Sparse-Based Design Scheme\",\"authors\":\"Yishuo Meng;Jianfei Wang;Siwei Xiang;Jia Hou;Zhijie Lin;Kuizhi Mei;Chen Yang\",\"doi\":\"10.1109/TCSI.2025.3554332\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Convolutional neural networks (CNNs) are evolving as they are applied to more diverse environments and more difficult challenges. The evolving induces various convolution modes (e.g., <inline-formula> <tex-math>$1\\\\times 1$ </tex-math></inline-formula> convolution, 2-stride convolution and rectangle convolution) in current CNNs and makes it difficult for the hardware accelerators to efficiently support such various convolution modes. In this paper, it is found that an important difference of these convolution modes is the computation density. Therefore, the above convolution modes are regarded as structured sparse and claims that sparse-based design methodology can be applied for the implementation of the reconfigurable CNN accelerator. Subsequently, two critical architectural parameters, including input tile size and convolution engine (CE) scale, are evaluated based on Standard deviation of calculations (SDC), unsupported convolution mode (UCM) and unsuitable I FM size (UIS), DSP utilization ratio (DUR) as well as hardware resource overhead (HRO), respectively. With the aid of the optimal parameters, a high-parallelism and flexible CE array and a high-performance and reconfigurable CNN architecture are designed. The accelerator was implemented on a Xilinx VC709 FPGA and ran at a clock frequency of 300 MHz, achieving 921.60 to 1382.40 GOPS while supporting various convolution modes. Compared with previous dense-/sparse-based works, the proposed accelerator can realize <inline-formula> <tex-math>$1.35\\\\times $ </tex-math></inline-formula> to <inline-formula> <tex-math>$10.77\\\\times $ </tex-math></inline-formula> improvements on performance and <inline-formula> <tex-math>$1.22\\\\times $ </tex-math></inline-formula> to <inline-formula> <tex-math>$2.84\\\\times $ </tex-math></inline-formula> improvements on DSP efficiency while deploying VGG16.\",\"PeriodicalId\":13039,\"journal\":{\"name\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"volume\":\"72 8\",\"pages\":\"3983-3996\"},\"PeriodicalIF\":5.2000,\"publicationDate\":\"2025-04-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Circuits and Systems I: Regular Papers\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10950430/\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Circuits and Systems I: Regular Papers","FirstCategoryId":"5","ListUrlMain":"https://ieeexplore.ieee.org/document/10950430/","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Rethinking the Designing of Convolution Engine for Reconfigurable CNN Accelerator Using Sparse-Based Design Scheme
Convolutional neural networks (CNNs) are evolving as they are applied to more diverse environments and more difficult challenges. The evolving induces various convolution modes (e.g., $1\times 1$ convolution, 2-stride convolution and rectangle convolution) in current CNNs and makes it difficult for the hardware accelerators to efficiently support such various convolution modes. In this paper, it is found that an important difference of these convolution modes is the computation density. Therefore, the above convolution modes are regarded as structured sparse and claims that sparse-based design methodology can be applied for the implementation of the reconfigurable CNN accelerator. Subsequently, two critical architectural parameters, including input tile size and convolution engine (CE) scale, are evaluated based on Standard deviation of calculations (SDC), unsupported convolution mode (UCM) and unsuitable I FM size (UIS), DSP utilization ratio (DUR) as well as hardware resource overhead (HRO), respectively. With the aid of the optimal parameters, a high-parallelism and flexible CE array and a high-performance and reconfigurable CNN architecture are designed. The accelerator was implemented on a Xilinx VC709 FPGA and ran at a clock frequency of 300 MHz, achieving 921.60 to 1382.40 GOPS while supporting various convolution modes. Compared with previous dense-/sparse-based works, the proposed accelerator can realize $1.35\times $ to $10.77\times $ improvements on performance and $1.22\times $ to $2.84\times $ improvements on DSP efficiency while deploying VGG16.
期刊介绍:
TCAS I publishes regular papers in the field specified by the theory, analysis, design, and practical implementations of circuits, and the application of circuit techniques to systems and to signal processing. Included is the whole spectrum from basic scientific theory to industrial applications. The field of interest covered includes: - Circuits: Analog, Digital and Mixed Signal Circuits and Systems - Nonlinear Circuits and Systems, Integrated Sensors, MEMS and Systems on Chip, Nanoscale Circuits and Systems, Optoelectronic - Circuits and Systems, Power Electronics and Systems - Software for Analog-and-Logic Circuits and Systems - Control aspects of Circuits and Systems.