Efficient Convolutional Neural Network Accelerator Based on Systolic Array

Yeong-Kang Lai, Yu-Jen Tsai
{"title":"Efficient Convolutional Neural Network Accelerator Based on Systolic Array","authors":"Yeong-Kang Lai, Yu-Jen Tsai","doi":"10.1109/ICCE53296.2022.9730180","DOIUrl":null,"url":null,"abstract":"This paper uses 72 PE as the basis for convolution operations, which can handle 3 x 3 and 1 x 1 filter sizes. Moreover, using the Systolic Array design architecture, the data reuse of this architecture is better than general PE architecture. Systolic Array architecture only needs to access once. This paper integrates Convolution and Max Pooling. This hardware verifies on Xilinx ZCU102 FPGA board. The hardware uses quantized weight parameters, and the hardware arithmetic precision is UINT8. The operation frequency sets at 100 MHz, throughput can reach 14.4 GOPs. The efficiency is 98.90%, the bandwidth is 150.82 MB, and Convolution integrates Max-Pooling to save 31.75% of DRAM access. In the future, the Operation Frequency can increase to more than 200 MHZ. The increase in the number of PEs can enhance the efficiency of parallel operations, which can effectively improve the throughput of the hardware.","PeriodicalId":350644,"journal":{"name":"2022 IEEE International Conference on Consumer Electronics (ICCE)","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Consumer Electronics (ICCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCE53296.2022.9730180","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper uses 72 PE as the basis for convolution operations, which can handle 3 x 3 and 1 x 1 filter sizes. Moreover, using the Systolic Array design architecture, the data reuse of this architecture is better than general PE architecture. Systolic Array architecture only needs to access once. This paper integrates Convolution and Max Pooling. This hardware verifies on Xilinx ZCU102 FPGA board. The hardware uses quantized weight parameters, and the hardware arithmetic precision is UINT8. The operation frequency sets at 100 MHz, throughput can reach 14.4 GOPs. The efficiency is 98.90%, the bandwidth is 150.82 MB, and Convolution integrates Max-Pooling to save 31.75% of DRAM access. In the future, the Operation Frequency can increase to more than 200 MHZ. The increase in the number of PEs can enhance the efficiency of parallel operations, which can effectively improve the throughput of the hardware.
基于收缩阵列的高效卷积神经网络加速器
本文使用72 PE作为卷积运算的基础,可以处理3 × 3和1 × 1滤波器大小。此外,该体系结构采用了Systolic Array设计体系结构,其数据重用性优于一般PE体系结构。收缩阵列架构只需要访问一次。本文将卷积和最大池化相结合。该硬件在Xilinx ZCU102 FPGA板上进行验证。硬件使用量化权重参数,硬件算术精度为UINT8。工作频率设定在100mhz,吞吐量可达14.4 GOPs。效率为98.90%,带宽为150.82 MB,并且集成了Max-Pooling,节省了31.75%的DRAM访问。未来,“工作频率”可提高到200mhz以上。pe数量的增加可以提高并行操作的效率,从而有效地提高硬件的吞吐量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信