Quantization and sparsity-aware processing for energy-efficient NVM-based convolutional neural networks

IF 1.9 Q3 ENGINEERING, ELECTRICAL & ELECTRONIC
Han Bao, Yi-Fan Qin, Jia Chen, Ling Yang, Jiancong Li, Houji Zhou, Yi Li, Xiangshui Miao
{"title":"Quantization and sparsity-aware processing for energy-efficient NVM-based convolutional neural networks","authors":"Han Bao, Yi-Fan Qin, Jia Chen, Ling Yang, Jiancong Li, Houji Zhou, Yi Li, Xiangshui Miao","doi":"10.3389/felec.2022.954661","DOIUrl":null,"url":null,"abstract":"Nonvolatile memory (NVM)-based convolutional neural networks (NvCNNs) have received widespread attention as a promising solution for hardware edge intelligence. However, there still exist many challenges in the resource-constrained conditions, such as the limitations of the hardware precision and cost and, especially, the large overhead of the analog-to-digital converters (ADCs). In this study, we systematically analyze the performance of NvCNNs and the hardware restrictions with quantization in both weight and activation and propose the corresponding requirements of NVM devices and peripheral circuits for multiply–accumulate (MAC) units. In addition, we put forward an in situ sparsity-aware processing method that exploits the sparsity of the network and the device array characteristics to further improve the energy efficiency of quantized NvCNNs. Our results suggest that the 4-bit-weight and 3-bit-activation (W4A3) design demonstrates the optimal compromise between the network performance and hardware overhead, achieving 98.82% accuracy for the Modified National Institute of Standards and Technology database (MNIST) classification task. Moreover, higher-precision designs will claim more restrictive requirements for hardware nonidealities including the variations of NVM devices and the nonlinearities of the converters. Moreover, the sparsity-aware processing method can obtain 79%/53% ADC energy reduction and 2.98×/1.15× energy efficiency improvement based on the W8A8/W4A3 quantization design with an array size of 128 × 128.","PeriodicalId":73081,"journal":{"name":"Frontiers in electronics","volume":" ","pages":""},"PeriodicalIF":1.9000,"publicationDate":"2022-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in electronics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/felec.2022.954661","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 1

Abstract

Nonvolatile memory (NVM)-based convolutional neural networks (NvCNNs) have received widespread attention as a promising solution for hardware edge intelligence. However, there still exist many challenges in the resource-constrained conditions, such as the limitations of the hardware precision and cost and, especially, the large overhead of the analog-to-digital converters (ADCs). In this study, we systematically analyze the performance of NvCNNs and the hardware restrictions with quantization in both weight and activation and propose the corresponding requirements of NVM devices and peripheral circuits for multiply–accumulate (MAC) units. In addition, we put forward an in situ sparsity-aware processing method that exploits the sparsity of the network and the device array characteristics to further improve the energy efficiency of quantized NvCNNs. Our results suggest that the 4-bit-weight and 3-bit-activation (W4A3) design demonstrates the optimal compromise between the network performance and hardware overhead, achieving 98.82% accuracy for the Modified National Institute of Standards and Technology database (MNIST) classification task. Moreover, higher-precision designs will claim more restrictive requirements for hardware nonidealities including the variations of NVM devices and the nonlinearities of the converters. Moreover, the sparsity-aware processing method can obtain 79%/53% ADC energy reduction and 2.98×/1.15× energy efficiency improvement based on the W8A8/W4A3 quantization design with an array size of 128 × 128.
基于nvm的高能效卷积神经网络的量化和稀疏感知处理
基于非易失性存储器(NVM)的卷积神经网络(nvcnn)作为一种有前途的硬件边缘智能解决方案受到了广泛关注。然而,在资源有限的条件下,仍然存在许多挑战,例如硬件精度和成本的限制,特别是模数转换器(adc)的巨大开销。在本研究中,我们系统地分析了nvcnn的性能以及量化权重和激活的硬件限制,并提出了相应的NVM设备和外围电路对乘法累加(MAC)单元的要求。此外,我们提出了一种原位稀疏感知处理方法,利用网络的稀疏性和设备阵列特性,进一步提高量化nvcnn的能量效率。我们的结果表明,4位权重和3位激活(W4A3)设计展示了网络性能和硬件开销之间的最佳折衷,在修改的国家标准与技术研究所数据库(MNIST)分类任务中实现了98.82%的准确率。此外,更高精度的设计将对硬件非理想性提出更严格的要求,包括NVM器件的变化和转换器的非线性。此外,稀疏感知处理方法在阵列尺寸为128 × 128的W8A8/W4A3量化设计基础上,可获得79%/53%的ADC能量降低和2.98×/1.15×的能效提升。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信