- Book学术

发布求助

文献互助智能选刊最新文献

2020 International SoC Design Conference (ISOCC) Pub Date : 2020-10-21 DOI:10.1109/ISOCC50952.2020.9333025

Chung-Bin Wu, Y. Hwang, Yu-Cheng Hsueh, Yu-Kuan Hsiao

引用次数: 1

摘要

本文提出了一种用于Tiny-Yolo V2的神经网络加速器。输入特征图、输出特征图和权重核的数据格式通过量化策略转换为uint8，以减少数据大小，使硬件利用率更高。此外，我们还提出了一种输入特征映射放置方法，以降低带宽利用率并提高PE利用率。为了验证硬件实现，使用Xilinx ZCU102平台验证硬件架构。综合结果表明，该架构在90nm内实现，通过99个GOPS/ m栅极的面积效率可以达到14.4GOPS@100Mhz。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

High Efficient Bandwidth Utilization Hardware Design and Implement for AI Deep Learning Accelerator

This paper proposes a neural network accelerator for Tiny-Yolo V2. The data format of input feature maps, output feature maps, and weight kernels are converted to uint8 through a quantization strategy to reduce the data size and make the hardware utilization more efficient. Moreover, we propose an input feature maps placement method to reduce bandwidth utilization and improve PE utilization. To verify the hardware implementation, the Xilinx ZCU102 platform is used to verify the hardware architecture. Synthesis results show that the proposed architecture implements in 90nm can achieve 14.4GOPS@100Mhz with area efficiency by 99 GOPS/M-gates.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 International SoC Design Conference (ISOCC)

自引率

0.00%

发文量