A 133.6TOPS/W Compute-In-Memory SRAM Macro with Fully Parallel One-Step Multi-Bit Computation

2022 IEEE Custom Integrated Circuits Conference (CICC) Pub Date : 2022-04-01 DOI:10.1109/CICC53496.2022.9772821

E. Choi, Injun Choi, Chanhee Jeon, Gichan Yun, Donghyeon Yi, S. Ha, I. Chang, M. Je

{"title":"A 133.6TOPS/W Compute-In-Memory SRAM Macro with Fully Parallel One-Step Multi-Bit Computation","authors":"E. Choi, Injun Choi, Chanhee Jeon, Gichan Yun, Donghyeon Yi, S. Ha, I. Chang, M. Je","doi":"10.1109/CICC53496.2022.9772821","DOIUrl":null,"url":null,"abstract":"Over the years, SRAM-based compute-in-memory (CIM) structures have shown ways to perform deep neural network (DNN) computations in the mixed-signal domain with high energy efficiency but suffer from the tradeoff and limitations in their accuracy arising from analog nonidealities. Recently, circuit techniques were developed to support multi-bit analog computations in SRAM-based CIM macro [1], [2], which computes multiplication and accumulation by using transistor currents. However, the transistor current has nonlinear characteristics with respect to the gate voltage, significantly degrading the accuracies of DNNs. Some works address this problem by using charge-based computation [3], [4], where the multiplication results between 1b weight and multi-bit inputs are firstly stored in capacitors. Multi-bit-weight computations are then achieved by shifting and adding the multiplication result outputs either in the digital domain [3] or in the analog domain using a charge-sharing method [1]. The digital method typically requires a higher ADC precision and one ADC for every accumulation, becoming power heavy. The analog charge-sharing method requires switches to control, being exposed to charge injection noise and dissipating considerable power to turn on and off the switches. To address these issues, this work proposes an 8T1C SRAM-based CIM macro structure, which supports (1) multi-bit-weight charge-based computation without additional switches used for charge sharing; (2) a simple and fast computation where multi-bit-weight multiply-accumulate-averaging (MAV) voltage is immediately formed when the input is given, namely “one-step” computation; (3) compact 8T1C bit cell using metal-oxide-metal (MOM) capacitor which incurs only 1.5× cell area of the conventional 6T SRAM under logic rules; and (4) no additional power consumption in bit-shift for energy-efficient computing. We fabricated the proposed 4kb SRAM CIM macro in a 65nm process, whose structure is shown in Fig. 1, supporting a fully parallel computation of 1024 MAV operations with 64 4b inputs and 16 4b weights.","PeriodicalId":415990,"journal":{"name":"2022 IEEE Custom Integrated Circuits Conference (CICC)","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Custom Integrated Circuits Conference (CICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICC53496.2022.9772821","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Over the years, SRAM-based compute-in-memory (CIM) structures have shown ways to perform deep neural network (DNN) computations in the mixed-signal domain with high energy efficiency but suffer from the tradeoff and limitations in their accuracy arising from analog nonidealities. Recently, circuit techniques were developed to support multi-bit analog computations in SRAM-based CIM macro [1], [2], which computes multiplication and accumulation by using transistor currents. However, the transistor current has nonlinear characteristics with respect to the gate voltage, significantly degrading the accuracies of DNNs. Some works address this problem by using charge-based computation [3], [4], where the multiplication results between 1b weight and multi-bit inputs are firstly stored in capacitors. Multi-bit-weight computations are then achieved by shifting and adding the multiplication result outputs either in the digital domain [3] or in the analog domain using a charge-sharing method [1]. The digital method typically requires a higher ADC precision and one ADC for every accumulation, becoming power heavy. The analog charge-sharing method requires switches to control, being exposed to charge injection noise and dissipating considerable power to turn on and off the switches. To address these issues, this work proposes an 8T1C SRAM-based CIM macro structure, which supports (1) multi-bit-weight charge-based computation without additional switches used for charge sharing; (2) a simple and fast computation where multi-bit-weight multiply-accumulate-averaging (MAV) voltage is immediately formed when the input is given, namely “one-step” computation; (3) compact 8T1C bit cell using metal-oxide-metal (MOM) capacitor which incurs only 1.5× cell area of the conventional 6T SRAM under logic rules; and (4) no additional power consumption in bit-shift for energy-efficient computing. We fabricated the proposed 4kb SRAM CIM macro in a 65nm process, whose structure is shown in Fig. 1, supporting a fully parallel computation of 1024 MAV operations with 64 4b inputs and 16 4b weights.

查看原文本刊更多论文

具有完全并行一步多比特计算的133.6TOPS/W内存SRAM宏

多年来，基于sram的内存计算(CIM)结构已经显示出在混合信号域中以高能效执行深度神经网络(DNN)计算的方法，但由于模拟非理想性而导致其精度受到权衡和限制。最近，基于sram的CIM宏[1]，[2]中支持多位模拟计算的电路技术得到了发展，该宏利用晶体管电流计算乘法和累加。然而，晶体管电流对栅极电压具有非线性特性，这大大降低了深度神经网络的精度。一些研究通过使用基于电荷的计算来解决这个问题[3]，[4]，其中1b权重和多位输入之间的乘法结果首先存储在电容器中。然后通过使用电荷共享方法[1]在数字域[3]或模拟域中移动和添加乘法结果输出来实现多比特权计算。数字方法通常需要更高的ADC精度，并且每次累积需要一个ADC，因此功耗很大。模拟电荷共享方法需要控制开关，暴露在电荷注入噪声中，并且要消耗相当大的功率来打开和关闭开关。为了解决这些问题，本工作提出了一种基于8T1C sram的CIM宏结构，该结构支持(1)基于多比特重量的电荷计算，无需用于电荷共享的额外开关;(2)计算简单快速，输入给定后立即形成多比特权乘-累积平均(MAV)电压，即“一步”计算;(3)采用金属氧化物金属(MOM)电容的紧凑8T1C位单元，逻辑规则下的单元面积仅为传统6T SRAM的1.5倍;(4)在位移位中没有额外的功耗，从而实现节能计算。我们在65nm工艺中制作了拟议的4kb SRAM CIM宏，其结构如图1所示，支持1024个MAV操作的完全并行计算，64个4b输入和16个4b权重。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE Custom Integrated Circuits Conference (CICC)

自引率

0.00%

发文量