{"title":"基于收缩阵列的高效卷积神经网络加速器","authors":"Yeong-Kang Lai, Yu-Jen Tsai","doi":"10.1109/ICCE53296.2022.9730180","DOIUrl":null,"url":null,"abstract":"This paper uses 72 PE as the basis for convolution operations, which can handle 3 x 3 and 1 x 1 filter sizes. Moreover, using the Systolic Array design architecture, the data reuse of this architecture is better than general PE architecture. Systolic Array architecture only needs to access once. This paper integrates Convolution and Max Pooling. This hardware verifies on Xilinx ZCU102 FPGA board. The hardware uses quantized weight parameters, and the hardware arithmetic precision is UINT8. The operation frequency sets at 100 MHz, throughput can reach 14.4 GOPs. The efficiency is 98.90%, the bandwidth is 150.82 MB, and Convolution integrates Max-Pooling to save 31.75% of DRAM access. In the future, the Operation Frequency can increase to more than 200 MHZ. The increase in the number of PEs can enhance the efficiency of parallel operations, which can effectively improve the throughput of the hardware.","PeriodicalId":350644,"journal":{"name":"2022 IEEE International Conference on Consumer Electronics (ICCE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient Convolutional Neural Network Accelerator Based on Systolic Array\",\"authors\":\"Yeong-Kang Lai, Yu-Jen Tsai\",\"doi\":\"10.1109/ICCE53296.2022.9730180\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper uses 72 PE as the basis for convolution operations, which can handle 3 x 3 and 1 x 1 filter sizes. Moreover, using the Systolic Array design architecture, the data reuse of this architecture is better than general PE architecture. Systolic Array architecture only needs to access once. This paper integrates Convolution and Max Pooling. This hardware verifies on Xilinx ZCU102 FPGA board. The hardware uses quantized weight parameters, and the hardware arithmetic precision is UINT8. The operation frequency sets at 100 MHz, throughput can reach 14.4 GOPs. The efficiency is 98.90%, the bandwidth is 150.82 MB, and Convolution integrates Max-Pooling to save 31.75% of DRAM access. In the future, the Operation Frequency can increase to more than 200 MHZ. The increase in the number of PEs can enhance the efficiency of parallel operations, which can effectively improve the throughput of the hardware.\",\"PeriodicalId\":350644,\"journal\":{\"name\":\"2022 IEEE International Conference on Consumer Electronics (ICCE)\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Conference on Consumer Electronics (ICCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCE53296.2022.9730180\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Consumer Electronics (ICCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCE53296.2022.9730180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient Convolutional Neural Network Accelerator Based on Systolic Array
This paper uses 72 PE as the basis for convolution operations, which can handle 3 x 3 and 1 x 1 filter sizes. Moreover, using the Systolic Array design architecture, the data reuse of this architecture is better than general PE architecture. Systolic Array architecture only needs to access once. This paper integrates Convolution and Max Pooling. This hardware verifies on Xilinx ZCU102 FPGA board. The hardware uses quantized weight parameters, and the hardware arithmetic precision is UINT8. The operation frequency sets at 100 MHz, throughput can reach 14.4 GOPs. The efficiency is 98.90%, the bandwidth is 150.82 MB, and Convolution integrates Max-Pooling to save 31.75% of DRAM access. In the future, the Operation Frequency can increase to more than 200 MHZ. The increase in the number of PEs can enhance the efficiency of parallel operations, which can effectively improve the throughput of the hardware.