A 5.1pJ/Neuron 127.3us/Inference RNN-based Speech Recognition Processor using 16 Computing-in-Memory SRAM Macros in 65nm CMOS

2019 Symposium on VLSI Circuits Pub Date : 2019-06-09 DOI:10.23919/VLSIC.2019.8778028

Ruiqi Guo, Yonggang Liu, Shixuan Zheng, Ssu-Yen Wu, P. Ouyang, W. Khwa, Xi Chen, Jia-Jing Chen, Xiudong Li, Leibo Liu, Meng-Fan Chang, Shaojun Wei, S. Yin

引用次数: 41

Abstract

This work presents a 65nm CMOS speech recognition processor, named Thinker-IM, which employs 16 computing-in-memory (SRAM-CIM) macros for binarized recurrent neural network (RNN) computation. Its major contributions are: 1) A novel digital-CIM mixed architecture that runs an output-weight dual stationary (OWDS) dataflow, reducing 85.7% memory accessing; 2) Multi-bit XNOR SRAM-CIM macros and corresponding CIM-aware weight adaptation that reduces 9.9% energy consumption in average; 3) Predictive early batch-normalization (BN) and binarization units (PBUs) that reduce at most 28.3% computations in RNN. Measured results show the processing speed of 127.3us/Inference and over 90.2% accuracy, while achieving neural energy efficiency of 5.1pJ/Neuron, which is 2.8 × better than state-of-the-art.

查看原文本刊更多论文

一种基于5.1pJ/神经元127.3us/推理rnn的语音识别处理器，使用16个内存计算SRAM宏

本研究提出了一种65nm CMOS语音识别处理器，名为Thinker-IM，它采用16个内存中计算(SRAM-CIM)宏进行二值化递归神经网络(RNN)计算。它的主要贡献是:1)一种新颖的数字- cim混合架构，运行输出-权重双平稳(OWDS)数据流，减少了85.7%的内存访问;2)多比特XNOR SRAM-CIM宏和相应的cim感知权重适应，平均降低9.9%的能耗;3)预测早期批归一化(BN)和二值化单元(PBUs)在RNN中最多减少28.3%的计算量。测量结果表明，该算法的处理速度达到127.3us/Inference，准确率超过90.2%，神经能量效率达到5.1pJ/Neuron，比目前的技术水平提高2.8倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 Symposium on VLSI Circuits

自引率

0.00%

发文量