利用激活非结构化稀疏性进行高效DNN推理的数字SRAM内存计算设计

Baiqing Zhong, Mingyu Wang, Chuanghao Zhang, Yangzhan Mai, Xiaojie Li, Zhiyi Yu
{"title":"利用激活非结构化稀疏性进行高效DNN推理的数字SRAM内存计算设计","authors":"Baiqing Zhong, Mingyu Wang, Chuanghao Zhang, Yangzhan Mai, Xiaojie Li, Zhiyi Yu","doi":"10.1109/ISVLSI59464.2023.10238597","DOIUrl":null,"url":null,"abstract":"The Computing-in-Memory (CIM) architecture has emerged as a promising approach for designing energy-efficient DNN processors. While previous CIM designs have explored the use of DNN weight sparsity, these approaches often involve pruning the weight matrix in a specific manner. This process may increase the new complexity of the calculation and negatively impact DNN accuracy. However, there are barely any digital CIM circuits that leverage the sparsity in activation which is naturally sparse in many scenarios due to the ReLU activation functions. In order to fully utilize activation unstructured sparsity, we proposed a digital SRAM CIM. This circuit is designed using the booth encoding scheme and adopts the circuit structure of an accumulator-based multiply-accumulate (MAC) calculation. It utilizes SRAM bit-line (BL) computing to obtain matrix sparse information and employs an allocator to allocate data calculation for SRAM-CIM. The proposed design is implemented and evaluated at 40 nm CMOS process. Our evaluation results show that the proposed circuit can achieve a clock frequency of 1 GHz at 1.1 V, with a peak performance of 819.2 GOPS, and in the case of 50%-90% sparsity, SRAM-CIM achieves $1.12 \\times 3.32 \\times$ speedup, and energy savings of 48.2% to 90.57% over dense mode. When performing an 8-bit matrix multiplication with 90% sparsity, the energy efficiency is 10.57 TOPS/W.","PeriodicalId":199371,"journal":{"name":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","volume":"105 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Digital SRAM Computing-in-Memory Design Utilizing Activation Unstructured Sparsity for High-Efficient DNN Inference\",\"authors\":\"Baiqing Zhong, Mingyu Wang, Chuanghao Zhang, Yangzhan Mai, Xiaojie Li, Zhiyi Yu\",\"doi\":\"10.1109/ISVLSI59464.2023.10238597\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Computing-in-Memory (CIM) architecture has emerged as a promising approach for designing energy-efficient DNN processors. While previous CIM designs have explored the use of DNN weight sparsity, these approaches often involve pruning the weight matrix in a specific manner. This process may increase the new complexity of the calculation and negatively impact DNN accuracy. However, there are barely any digital CIM circuits that leverage the sparsity in activation which is naturally sparse in many scenarios due to the ReLU activation functions. In order to fully utilize activation unstructured sparsity, we proposed a digital SRAM CIM. This circuit is designed using the booth encoding scheme and adopts the circuit structure of an accumulator-based multiply-accumulate (MAC) calculation. It utilizes SRAM bit-line (BL) computing to obtain matrix sparse information and employs an allocator to allocate data calculation for SRAM-CIM. The proposed design is implemented and evaluated at 40 nm CMOS process. Our evaluation results show that the proposed circuit can achieve a clock frequency of 1 GHz at 1.1 V, with a peak performance of 819.2 GOPS, and in the case of 50%-90% sparsity, SRAM-CIM achieves $1.12 \\\\times 3.32 \\\\times$ speedup, and energy savings of 48.2% to 90.57% over dense mode. When performing an 8-bit matrix multiplication with 90% sparsity, the energy efficiency is 10.57 TOPS/W.\",\"PeriodicalId\":199371,\"journal\":{\"name\":\"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)\",\"volume\":\"105 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISVLSI59464.2023.10238597\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISVLSI59464.2023.10238597","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

内存计算(CIM)架构已成为设计节能深度神经网络处理器的一种有前途的方法。虽然以前的CIM设计已经探索了DNN权稀疏性的使用,但这些方法通常涉及以特定的方式修剪权矩阵。这个过程可能会增加新的计算复杂性,并对深度神经网络的精度产生负面影响。然而,几乎没有任何数字CIM电路利用激活中的稀疏性,由于ReLU激活函数,在许多场景中,稀疏性是自然的。为了充分利用激活非结构化稀疏性,我们提出了一种数字SRAM CIM。本电路采用booth编码方案设计,采用基于累加器的乘法累加(MAC)计算电路结构。它利用SRAM位线(BL)计算获得矩阵稀疏信息,并采用分配器为SRAM- cim分配数据计算。该设计在40纳米CMOS工艺下实现并进行了评估。我们的评估结果表明,所提出的电路在1.1 V时可以实现1 GHz的时钟频率,峰值性能为819.2 GOPS,并且在50%-90%稀疏度的情况下,SRAM-CIM比密集模式实现了1.12倍3.32倍的加速,节能48.2%至90.57%。当以90%的稀疏度执行8位矩阵乘法时,能量效率为10.57 TOPS/W。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Digital SRAM Computing-in-Memory Design Utilizing Activation Unstructured Sparsity for High-Efficient DNN Inference
The Computing-in-Memory (CIM) architecture has emerged as a promising approach for designing energy-efficient DNN processors. While previous CIM designs have explored the use of DNN weight sparsity, these approaches often involve pruning the weight matrix in a specific manner. This process may increase the new complexity of the calculation and negatively impact DNN accuracy. However, there are barely any digital CIM circuits that leverage the sparsity in activation which is naturally sparse in many scenarios due to the ReLU activation functions. In order to fully utilize activation unstructured sparsity, we proposed a digital SRAM CIM. This circuit is designed using the booth encoding scheme and adopts the circuit structure of an accumulator-based multiply-accumulate (MAC) calculation. It utilizes SRAM bit-line (BL) computing to obtain matrix sparse information and employs an allocator to allocate data calculation for SRAM-CIM. The proposed design is implemented and evaluated at 40 nm CMOS process. Our evaluation results show that the proposed circuit can achieve a clock frequency of 1 GHz at 1.1 V, with a peak performance of 819.2 GOPS, and in the case of 50%-90% sparsity, SRAM-CIM achieves $1.12 \times 3.32 \times$ speedup, and energy savings of 48.2% to 90.57% over dense mode. When performing an 8-bit matrix multiplication with 90% sparsity, the energy efficiency is 10.57 TOPS/W.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信