Double MAC on a Cell: A 22-nm 8T-SRAM-Based Analog In-Memory Accelerator for Binary/Ternary Neural Networks Featuring Split Wordline

IF 2.4 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Hiroto Tagata;Takashi Sato;Hiromitsu Awano
{"title":"Double MAC on a Cell: A 22-nm 8T-SRAM-Based Analog In-Memory Accelerator for Binary/Ternary Neural Networks Featuring Split Wordline","authors":"Hiroto Tagata;Takashi Sato;Hiromitsu Awano","doi":"10.1109/OJCAS.2024.3482469","DOIUrl":null,"url":null,"abstract":"This paper proposes a novel 8T-SRAM based computing-in-memory (CIM) accelerator for the Binary/Ternary neural networks. The proposed split dual-port 8T-SRAM cell has two input ports, simultaneously performing two binary multiply-and-accumulate (MAC) operations on left and right bitlines. This approach enables a twofold increase in throughput without significantly increasing area or power consumption, since the area overhead for doubling throughput is only two additional WL wires compared to the conventional 8T-SRAM. In addition, the proposed circuit supports binary and ternary activation input, allowing flexible adjustment of high energy efficiency and high inference accuracy depending on the application. The proposed SRAM macro consists of a \n<inline-formula> <tex-math>$128 \\times 128$ </tex-math></inline-formula>\n SRAM array that outputs the MAC operation results of 96 binary/ternary inputs and \n<inline-formula> <tex-math>$96 \\times 128$ </tex-math></inline-formula>\n binary weights as 1-5 bit digital values. The proposed circuit performance was evaluated by post-layout simulation with the 22-nm process layout of the overall CIM macro. The proposed circuit is capable of high-speed operation at 1 GHz. It achieves a maximum area efficiency of 3320 TOPS/mm2, which is \n<inline-formula> <tex-math>$3.4 \\times $ </tex-math></inline-formula>\n higher compared to existing research with a reasonable energy efficiency of 1471 TOPS/W. The simulated inference accuracies of the proposed circuit are 96.45%/97.67% for MNIST dataset with binary/ternary MLP model, and 86.32%/88.56% for CIFAR-10 dataset with binary/ternary VGG-like CNN model.","PeriodicalId":93442,"journal":{"name":"IEEE open journal of circuits and systems","volume":"5 ","pages":"328-340"},"PeriodicalIF":2.4000,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10721281","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE open journal of circuits and systems","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10721281/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

This paper proposes a novel 8T-SRAM based computing-in-memory (CIM) accelerator for the Binary/Ternary neural networks. The proposed split dual-port 8T-SRAM cell has two input ports, simultaneously performing two binary multiply-and-accumulate (MAC) operations on left and right bitlines. This approach enables a twofold increase in throughput without significantly increasing area or power consumption, since the area overhead for doubling throughput is only two additional WL wires compared to the conventional 8T-SRAM. In addition, the proposed circuit supports binary and ternary activation input, allowing flexible adjustment of high energy efficiency and high inference accuracy depending on the application. The proposed SRAM macro consists of a $128 \times 128$ SRAM array that outputs the MAC operation results of 96 binary/ternary inputs and $96 \times 128$ binary weights as 1-5 bit digital values. The proposed circuit performance was evaluated by post-layout simulation with the 22-nm process layout of the overall CIM macro. The proposed circuit is capable of high-speed operation at 1 GHz. It achieves a maximum area efficiency of 3320 TOPS/mm2, which is $3.4 \times $ higher compared to existing research with a reasonable energy efficiency of 1471 TOPS/W. The simulated inference accuracies of the proposed circuit are 96.45%/97.67% for MNIST dataset with binary/ternary MLP model, and 86.32%/88.56% for CIFAR-10 dataset with binary/ternary VGG-like CNN model.
单元上的双 MAC:基于 22 纳米 8T-SRAM 的模拟内存加速器,用于二元/三元神经网络,具有分割字线功能
本文为二元/三元神经网络提出了一种基于 8T-SRAM 的新型内存计算(CIM)加速器。所提出的分离式双端口 8T-SRAM 单元有两个输入端口,可同时在左右位线上执行两个二进制乘法累加 (MAC) 运算。与传统的 8T-SRAM 相比,增加一倍吞吐量所需的面积开销仅为两条额外的 WL 线,因此这种方法能在不显著增加面积或功耗的情况下将吞吐量提高两倍。此外,所提出的电路支持二元和三元激活输入,可根据应用灵活调整高能效和高推理精度。拟议的 SRAM 宏由一个 128 美元的 SRAM 阵列组成,可将 96 个二进制/三进制输入的 MAC 运算结果和 96 个 128 美元的二进制权重输出为 1-5 位数字值。通过对整个 CIM 宏的 22 纳米工艺布局进行布局后仿真,对所提出的电路性能进行了评估。所提出的电路能够以 1 GHz 的频率高速运行。它实现了 3320 TOPS/mm2 的最大面积效率,与现有研究相比提高了 3.4 倍,合理能效为 1471 TOPS/W。在二元/三元 MLP 模型的 MNIST 数据集和二元/三元 VGG-like CNN 模型的 CIFAR-10 数据集上,所提电路的模拟推理准确率分别为 96.45%/97.67% 和 86.32%/88.56% 。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
审稿时长
19 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信