一种基于5.1pJ/神经元127.3us/推理rnn的语音识别处理器,使用16个内存计算SRAM宏

Ruiqi Guo, Yonggang Liu, Shixuan Zheng, Ssu-Yen Wu, P. Ouyang, W. Khwa, Xi Chen, Jia-Jing Chen, Xiudong Li, Leibo Liu, Meng-Fan Chang, Shaojun Wei, S. Yin
{"title":"一种基于5.1pJ/神经元127.3us/推理rnn的语音识别处理器,使用16个内存计算SRAM宏","authors":"Ruiqi Guo, Yonggang Liu, Shixuan Zheng, Ssu-Yen Wu, P. Ouyang, W. Khwa, Xi Chen, Jia-Jing Chen, Xiudong Li, Leibo Liu, Meng-Fan Chang, Shaojun Wei, S. Yin","doi":"10.23919/VLSIC.2019.8778028","DOIUrl":null,"url":null,"abstract":"This work presents a 65nm CMOS speech recognition processor, named Thinker-IM, which employs 16 computing-in-memory (SRAM-CIM) macros for binarized recurrent neural network (RNN) computation. Its major contributions are: 1) A novel digital-CIM mixed architecture that runs an output-weight dual stationary (OWDS) dataflow, reducing 85.7% memory accessing; 2) Multi-bit XNOR SRAM-CIM macros and corresponding CIM-aware weight adaptation that reduces 9.9% energy consumption in average; 3) Predictive early batch-normalization (BN) and binarization units (PBUs) that reduce at most 28.3% computations in RNN. Measured results show the processing speed of 127.3us/Inference and over 90.2% accuracy, while achieving neural energy efficiency of 5.1pJ/Neuron, which is 2.8 × better than state-of-the-art.","PeriodicalId":6707,"journal":{"name":"2019 Symposium on VLSI Circuits","volume":"72 1","pages":"C120-C121"},"PeriodicalIF":0.0000,"publicationDate":"2019-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"41","resultStr":"{\"title\":\"A 5.1pJ/Neuron 127.3us/Inference RNN-based Speech Recognition Processor using 16 Computing-in-Memory SRAM Macros in 65nm CMOS\",\"authors\":\"Ruiqi Guo, Yonggang Liu, Shixuan Zheng, Ssu-Yen Wu, P. Ouyang, W. Khwa, Xi Chen, Jia-Jing Chen, Xiudong Li, Leibo Liu, Meng-Fan Chang, Shaojun Wei, S. Yin\",\"doi\":\"10.23919/VLSIC.2019.8778028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This work presents a 65nm CMOS speech recognition processor, named Thinker-IM, which employs 16 computing-in-memory (SRAM-CIM) macros for binarized recurrent neural network (RNN) computation. Its major contributions are: 1) A novel digital-CIM mixed architecture that runs an output-weight dual stationary (OWDS) dataflow, reducing 85.7% memory accessing; 2) Multi-bit XNOR SRAM-CIM macros and corresponding CIM-aware weight adaptation that reduces 9.9% energy consumption in average; 3) Predictive early batch-normalization (BN) and binarization units (PBUs) that reduce at most 28.3% computations in RNN. Measured results show the processing speed of 127.3us/Inference and over 90.2% accuracy, while achieving neural energy efficiency of 5.1pJ/Neuron, which is 2.8 × better than state-of-the-art.\",\"PeriodicalId\":6707,\"journal\":{\"name\":\"2019 Symposium on VLSI Circuits\",\"volume\":\"72 1\",\"pages\":\"C120-C121\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-06-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"41\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 Symposium on VLSI Circuits\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/VLSIC.2019.8778028\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 Symposium on VLSI Circuits","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/VLSIC.2019.8778028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 41

摘要

本研究提出了一种65nm CMOS语音识别处理器,名为Thinker-IM,它采用16个内存中计算(SRAM-CIM)宏进行二值化递归神经网络(RNN)计算。它的主要贡献是:1)一种新颖的数字- cim混合架构,运行输出-权重双平稳(OWDS)数据流,减少了85.7%的内存访问;2)多比特XNOR SRAM-CIM宏和相应的cim感知权重适应,平均降低9.9%的能耗;3)预测早期批归一化(BN)和二值化单元(PBUs)在RNN中最多减少28.3%的计算量。测量结果表明,该算法的处理速度达到127.3us/Inference,准确率超过90.2%,神经能量效率达到5.1pJ/Neuron,比目前的技术水平提高2.8倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A 5.1pJ/Neuron 127.3us/Inference RNN-based Speech Recognition Processor using 16 Computing-in-Memory SRAM Macros in 65nm CMOS
This work presents a 65nm CMOS speech recognition processor, named Thinker-IM, which employs 16 computing-in-memory (SRAM-CIM) macros for binarized recurrent neural network (RNN) computation. Its major contributions are: 1) A novel digital-CIM mixed architecture that runs an output-weight dual stationary (OWDS) dataflow, reducing 85.7% memory accessing; 2) Multi-bit XNOR SRAM-CIM macros and corresponding CIM-aware weight adaptation that reduces 9.9% energy consumption in average; 3) Predictive early batch-normalization (BN) and binarization units (PBUs) that reduce at most 28.3% computations in RNN. Measured results show the processing speed of 127.3us/Inference and over 90.2% accuracy, while achieving neural energy efficiency of 5.1pJ/Neuron, which is 2.8 × better than state-of-the-art.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信