基于网络压缩的高性能RNN加速器设计

2020 IEEE 2nd International Conference on Circuits and Systems (ICCS) Pub Date : 2020-12-10 DOI:10.1109/ICCS51219.2020.9336599

Wentao Zhu, Yuhao Sun, Zeyu Shen, Haichuan Yang, Yu Gong, Bo Liu

{"title":"基于网络压缩的高性能RNN加速器设计","authors":"Wentao Zhu, Yuhao Sun, Zeyu Shen, Haichuan Yang, Yu Gong, Bo Liu","doi":"10.1109/ICCS51219.2020.9336599","DOIUrl":null,"url":null,"abstract":"As the size of the neural networks expands, the computation and storage consumption of recurrent neural networks is increasing. To solve this problem, this paper proposes a recurrent neural network accelerator which can reduce computation redundancy, memory overhead and energy consumption. A novel network compression method based on pruning and hybgrid quantization is also proposed to reduce computation and memory overhead. Based on the designs above, a precision adaptive approximate calculation based accelerator is designed to achieve high energy efficiency. The experimental results show that under the TSMC 28nm process, when the data bit width is 4bit and the working voltage is 0.8V, the peak performance of proposed accelerator is not reduced, the power consumption is 38.4mW, and the energy efficiency is 2.7TOPs/W. The energy efficiency of proposed accelerator is 2.5 times more than that of the state-of-the-art recurrent neural network accelerators.","PeriodicalId":193552,"journal":{"name":"2020 IEEE 2nd International Conference on Circuits and Systems (ICCS)","volume":"189 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Design of High Performance RNN Accelerator Based on Network Compression\",\"authors\":\"Wentao Zhu, Yuhao Sun, Zeyu Shen, Haichuan Yang, Yu Gong, Bo Liu\",\"doi\":\"10.1109/ICCS51219.2020.9336599\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As the size of the neural networks expands, the computation and storage consumption of recurrent neural networks is increasing. To solve this problem, this paper proposes a recurrent neural network accelerator which can reduce computation redundancy, memory overhead and energy consumption. A novel network compression method based on pruning and hybgrid quantization is also proposed to reduce computation and memory overhead. Based on the designs above, a precision adaptive approximate calculation based accelerator is designed to achieve high energy efficiency. The experimental results show that under the TSMC 28nm process, when the data bit width is 4bit and the working voltage is 0.8V, the peak performance of proposed accelerator is not reduced, the power consumption is 38.4mW, and the energy efficiency is 2.7TOPs/W. The energy efficiency of proposed accelerator is 2.5 times more than that of the state-of-the-art recurrent neural network accelerators.\",\"PeriodicalId\":193552,\"journal\":{\"name\":\"2020 IEEE 2nd International Conference on Circuits and Systems (ICCS)\",\"volume\":\"189 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE 2nd International Conference on Circuits and Systems (ICCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCS51219.2020.9336599\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 2nd International Conference on Circuits and Systems (ICCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCS51219.2020.9336599","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

随着神经网络规模的扩大，递归神经网络的计算量和存储消耗也在不断增加。为了解决这一问题，本文提出了一种循环神经网络加速器，可以减少计算冗余、内存开销和能量消耗。为了减少计算量和内存开销，提出了一种基于剪枝和混合网格量化的网络压缩方法。在此基础上，设计了一种基于精确自适应近似计算的加速器，实现了加速器的高能效。实验结果表明，在TSMC 28nm工艺下，当数据位宽为4bit，工作电压为0.8V时，所提出的加速器峰值性能不降低，功耗为38.4mW，能效为2.7TOPs/W。该加速器的能量效率是目前最先进的循环神经网络加速器的2.5倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Design of High Performance RNN Accelerator Based on Network Compression

As the size of the neural networks expands, the computation and storage consumption of recurrent neural networks is increasing. To solve this problem, this paper proposes a recurrent neural network accelerator which can reduce computation redundancy, memory overhead and energy consumption. A novel network compression method based on pruning and hybgrid quantization is also proposed to reduce computation and memory overhead. Based on the designs above, a precision adaptive approximate calculation based accelerator is designed to achieve high energy efficiency. The experimental results show that under the TSMC 28nm process, when the data bit width is 4bit and the working voltage is 0.8V, the peak performance of proposed accelerator is not reduced, the power consumption is 38.4mW, and the energy efficiency is 2.7TOPs/W. The energy efficiency of proposed accelerator is 2.5 times more than that of the state-of-the-art recurrent neural network accelerators.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE 2nd International Conference on Circuits and Systems (ICCS)

自引率

0.00%

发文量