Efficient Convolutional Neural Network Accelerator Based on Systolic Array

2022 IEEE International Conference on Consumer Electronics (ICCE) Pub Date : 2022-01-07 DOI:10.1109/ICCE53296.2022.9730180

Yeong-Kang Lai, Yu-Jen Tsai

引用次数: 0

Abstract

This paper uses 72 PE as the basis for convolution operations, which can handle 3 x 3 and 1 x 1 filter sizes. Moreover, using the Systolic Array design architecture, the data reuse of this architecture is better than general PE architecture. Systolic Array architecture only needs to access once. This paper integrates Convolution and Max Pooling. This hardware verifies on Xilinx ZCU102 FPGA board. The hardware uses quantized weight parameters, and the hardware arithmetic precision is UINT8. The operation frequency sets at 100 MHz, throughput can reach 14.4 GOPs. The efficiency is 98.90%, the bandwidth is 150.82 MB, and Convolution integrates Max-Pooling to save 31.75% of DRAM access. In the future, the Operation Frequency can increase to more than 200 MHZ. The increase in the number of PEs can enhance the efficiency of parallel operations, which can effectively improve the throughput of the hardware.

查看原文本刊更多论文

基于收缩阵列的高效卷积神经网络加速器

本文使用72 PE作为卷积运算的基础，可以处理3 × 3和1 × 1滤波器大小。此外，该体系结构采用了Systolic Array设计体系结构，其数据重用性优于一般PE体系结构。收缩阵列架构只需要访问一次。本文将卷积和最大池化相结合。该硬件在Xilinx ZCU102 FPGA板上进行验证。硬件使用量化权重参数，硬件算术精度为UINT8。工作频率设定在100mhz，吞吐量可达14.4 GOPs。效率为98.90%，带宽为150.82 MB，并且集成了Max-Pooling，节省了31.75%的DRAM访问。未来，“工作频率”可提高到200mhz以上。pe数量的增加可以提高并行操作的效率，从而有效地提高硬件的吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Conference on Consumer Electronics (ICCE)

自引率

0.00%

发文量