DIMC: 2219TOPS/W 2569F2/b基于近似算术硬件的28nm数字内存计算宏

2022 IEEE International Solid- State Circuits Conference (ISSCC) Pub Date : 2022-02-20 DOI:10.1109/ISSCC42614.2022.9731659

Dewei Wang, Chuan Lin, Gregory K. Chen, Phil V. Knag, R. Krishnamurthy, Mingoo Seok

{"title":"DIMC: 2219TOPS/W 2569F2/b基于近似算术硬件的28nm数字内存计算宏","authors":"Dewei Wang, Chuan Lin, Gregory K. Chen, Phil V. Knag, R. Krishnamurthy, Mingoo Seok","doi":"10.1109/ISSCC42614.2022.9731659","DOIUrl":null,"url":null,"abstract":"In-memory-computing (IMC) SRAM architecture has gained significant attention as it achieves high energy efficiency for computing a convolutional neural network (CNN) model [1]. Recent works investigated the use of analog-mixed-signal (AMS) hardware for high area and energy efficiency [2], [3]. However, AMS hardware output is well known to be susceptible to process, voltage, and temperature (PVT) variations, limiting the computing precision and ultimately the inference accuracy of a CNN. We reconfirmed, through the simulation of a capacitor-based IMC SRAM macro that computes a 256D binary dot product, that the AMS computing hardware has a significant root-mean-square error (RMSE) of 22.5% across the worst-case voltage, temperature (Fig. 16.1.1 top left) and 3-sigma process variations (Fig. 16.1.1 top right). On the other hand, we can implement an IMC SRAM macro using robust digital logic [4], which can virtually eliminate the variability issue (Fig. 16.1.1 top). However, digital circuits require more devices than AMS counterparts (e.g., 28 transistors for a mirror full adder [FA]). As a result, a recent digital IMC SRAM shows a lower area efficiency of 6368F2/b (22nm, 4b/4b weight/activation) [5] than the AMS counterpart (1170F2/b, 65nm, 1b/1b) [3]. In light of this, we aim to adopt approximate arithmetic hardware to improve area and power efficiency and present two digital IMC macros (DIMC) with different levels of approximation (Fig. 16.1.1 bottom left). Also, we propose an approximation-aware training algorithm and a number format to minimize inference accuracy degradation induced by approximate hardware (Fig. 16.1.1 bottom right). We prototyped a 28nm test chip: for a 1b/1b CNN model for CIFAR-10 and across 0.5-to-1.1V supply, the DIMC with double-approximate hardware (DIMC-D) achieves 2569F2/b, 932-2219TOPS/W, 475-20032GOPS, and 86.96% accuracy, while for a 4b/1b CNN model, the DIMC with the single-approximate hardware (DIMC-S) achieves 3814F2/b, 458-990TOPS/W (normalized to 1b/1b), 405-19215GOPS (normalized to 1b/1b), and 90.41% accuracy.","PeriodicalId":6830,"journal":{"name":"2022 IEEE International Solid- State Circuits Conference (ISSCC)","volume":"55 1","pages":"266-268"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"DIMC: 2219TOPS/W 2569F2/b Digital In-Memory Computing Macro in 28nm Based on Approximate Arithmetic Hardware\",\"authors\":\"Dewei Wang, Chuan Lin, Gregory K. Chen, Phil V. Knag, R. Krishnamurthy, Mingoo Seok\",\"doi\":\"10.1109/ISSCC42614.2022.9731659\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In-memory-computing (IMC) SRAM architecture has gained significant attention as it achieves high energy efficiency for computing a convolutional neural network (CNN) model [1]. Recent works investigated the use of analog-mixed-signal (AMS) hardware for high area and energy efficiency [2], [3]. However, AMS hardware output is well known to be susceptible to process, voltage, and temperature (PVT) variations, limiting the computing precision and ultimately the inference accuracy of a CNN. We reconfirmed, through the simulation of a capacitor-based IMC SRAM macro that computes a 256D binary dot product, that the AMS computing hardware has a significant root-mean-square error (RMSE) of 22.5% across the worst-case voltage, temperature (Fig. 16.1.1 top left) and 3-sigma process variations (Fig. 16.1.1 top right). On the other hand, we can implement an IMC SRAM macro using robust digital logic [4], which can virtually eliminate the variability issue (Fig. 16.1.1 top). However, digital circuits require more devices than AMS counterparts (e.g., 28 transistors for a mirror full adder [FA]). As a result, a recent digital IMC SRAM shows a lower area efficiency of 6368F2/b (22nm, 4b/4b weight/activation) [5] than the AMS counterpart (1170F2/b, 65nm, 1b/1b) [3]. In light of this, we aim to adopt approximate arithmetic hardware to improve area and power efficiency and present two digital IMC macros (DIMC) with different levels of approximation (Fig. 16.1.1 bottom left). Also, we propose an approximation-aware training algorithm and a number format to minimize inference accuracy degradation induced by approximate hardware (Fig. 16.1.1 bottom right). We prototyped a 28nm test chip: for a 1b/1b CNN model for CIFAR-10 and across 0.5-to-1.1V supply, the DIMC with double-approximate hardware (DIMC-D) achieves 2569F2/b, 932-2219TOPS/W, 475-20032GOPS, and 86.96% accuracy, while for a 4b/1b CNN model, the DIMC with the single-approximate hardware (DIMC-S) achieves 3814F2/b, 458-990TOPS/W (normalized to 1b/1b), 405-19215GOPS (normalized to 1b/1b), and 90.41% accuracy.\",\"PeriodicalId\":6830,\"journal\":{\"name\":\"2022 IEEE International Solid- State Circuits Conference (ISSCC)\",\"volume\":\"55 1\",\"pages\":\"266-268\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Solid- State Circuits Conference (ISSCC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISSCC42614.2022.9731659\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Solid- State Circuits Conference (ISSCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISSCC42614.2022.9731659","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

摘要

内存计算(IMC) SRAM架构由于在计算卷积神经网络(CNN)模型[1]时实现了高能效而受到广泛关注。最近的工作调查了模拟混合信号(AMS)硬件在高面积和能源效率方面的使用[2]，[3]。然而，众所周知，AMS硬件输出容易受到过程、电压和温度(PVT)变化的影响，从而限制了CNN的计算精度和最终的推理精度。我们通过模拟基于电容的IMC SRAM宏来计算256D二进制点积，再次确认AMS计算硬件在最坏情况下的电压、温度(图16.1.1左上)和3 σ过程变化(图16.1.1右上)具有22.5%的显著均方根误差(RMSE)。另一方面，我们可以使用稳健的数字逻辑[4]实现IMC SRAM宏，这几乎可以消除可变性问题(图16.1.1顶部)。然而，数字电路比AMS需要更多的器件(例如，一个镜像全加法器需要28个晶体管[FA])。因此，最近的数字IMC SRAM显示出较低的面积效率为6368F2/b (22nm, 4b/4b重量/激活)[5]，而AMS的对应产品(1170F2/b, 65nm, 1b/1b)[3]。鉴于此，我们旨在采用近似算法硬件来提高面积和功耗效率，并提出了两个近似程度不同的数字IMC宏(DIMC)(图16.1.1左下)。此外，我们提出了一种近似感知训练算法和一种数字格式，以最小化由近似硬件引起的推理精度下降(图16.1.1右下)。我们制作了一个28纳米的测试芯片原型:对于CIFAR-10的1b/1b CNN模型，在0.5到1.1 v的电源中，双近似硬件(DIMC- d)的DIMC达到了2569F2/b、932-2219TOPS/W、475-20032GOPS和86.96%的精度，而对于4b/1b的CNN模型，单近似硬件(DIMC- s)的DIMC达到了3814F2/b、458-990TOPS/W(归一化到1b/1b)、405-19215GOPS(归一化到1b/1b)和90.41%的精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

DIMC: 2219TOPS/W 2569F2/b Digital In-Memory Computing Macro in 28nm Based on Approximate Arithmetic Hardware

In-memory-computing (IMC) SRAM architecture has gained significant attention as it achieves high energy efficiency for computing a convolutional neural network (CNN) model [1]. Recent works investigated the use of analog-mixed-signal (AMS) hardware for high area and energy efficiency [2], [3]. However, AMS hardware output is well known to be susceptible to process, voltage, and temperature (PVT) variations, limiting the computing precision and ultimately the inference accuracy of a CNN. We reconfirmed, through the simulation of a capacitor-based IMC SRAM macro that computes a 256D binary dot product, that the AMS computing hardware has a significant root-mean-square error (RMSE) of 22.5% across the worst-case voltage, temperature (Fig. 16.1.1 top left) and 3-sigma process variations (Fig. 16.1.1 top right). On the other hand, we can implement an IMC SRAM macro using robust digital logic [4], which can virtually eliminate the variability issue (Fig. 16.1.1 top). However, digital circuits require more devices than AMS counterparts (e.g., 28 transistors for a mirror full adder [FA]). As a result, a recent digital IMC SRAM shows a lower area efficiency of 6368F2/b (22nm, 4b/4b weight/activation) [5] than the AMS counterpart (1170F2/b, 65nm, 1b/1b) [3]. In light of this, we aim to adopt approximate arithmetic hardware to improve area and power efficiency and present two digital IMC macros (DIMC) with different levels of approximation (Fig. 16.1.1 bottom left). Also, we propose an approximation-aware training algorithm and a number format to minimize inference accuracy degradation induced by approximate hardware (Fig. 16.1.1 bottom right). We prototyped a 28nm test chip: for a 1b/1b CNN model for CIFAR-10 and across 0.5-to-1.1V supply, the DIMC with double-approximate hardware (DIMC-D) achieves 2569F2/b, 932-2219TOPS/W, 475-20032GOPS, and 86.96% accuracy, while for a 4b/1b CNN model, the DIMC with the single-approximate hardware (DIMC-S) achieves 3814F2/b, 458-990TOPS/W (normalized to 1b/1b), 405-19215GOPS (normalized to 1b/1b), and 90.41% accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Solid- State Circuits Conference (ISSCC)

自引率

0.00%

发文量