DCT-RAM: A Driver-Free Process-In-Memory 8T SRAM Macro with Multi-Bit Charge-Domain Computation and Time-Domain Quantization

Zhiyu Chen, Qing Jin, Zhanghao Yu, Yanzhi Wang, Kaiyuan Yang
{"title":"DCT-RAM: A Driver-Free Process-In-Memory 8T SRAM Macro with Multi-Bit Charge-Domain Computation and Time-Domain Quantization","authors":"Zhiyu Chen, Qing Jin, Zhanghao Yu, Yanzhi Wang, Kaiyuan Yang","doi":"10.1109/CICC53496.2022.9772826","DOIUrl":null,"url":null,"abstract":"Process-In-Memory (PIM) is a promising solution to alleviating the memory-wall bottleneck in memory-intensive applications like CNNs. Recent demonstrations of SRAM-based PIM designs, particularly those computing in the charge domain [1]–[5], have greatly improved the linearity of analog multiply-and-add computations (MAC) and quantization, and their robustness to process variations, making their inference accuracy approach that of digital hardware in practical computer vision benchmarks such as CIFAR-10. However, there remain several limitations towards large scale integration of PIM macros, especially the assumptions on the availability of powerful external reference voltage drivers and the lack of scaling friendly designs. More specifically, high-bandwidth analog buffers driving large output load are necessary to distribute the massive number of analog signals (e.g. DAC outputs) across the macro, without sacrificing signal fidelity and computing speed. [10] is one work that reports its DAC drivers occupying 11.4% of the macro area and incurring 94-pJ energy overhead in 28 nm, accounting for 68.5% of the total energy in a macro supporting $5\\mathrm{b}$ activations and $8\\mathrm{b}$ weight. Second, SAR ADCs are popular for the common 5–9 bit resolution range. High-speed power-hungry analog buffers are required in conventional SAR ADCs to drive the capacitive DACs (CDACs) to reference voltages, with short settling time and high accuracy. Given the hundreds of ADCs in each macro, the design complexity and overheads incurred by these drivers are dominant. Our simulated reference driver takes 2.9-pJ energy in 65 nm, which is comparable to an ADC (e.g. 3.56 $\\text{pJ}$ in [12]). Third, it is challenging to fit any conventional $\\geq 7\\mathrm{b}$ SAR ADC into the narrow width of SRAM cells due to the bulky CDACs and layout matching requirements, ultimately limiting the computing parallelism and energy amortization.","PeriodicalId":415990,"journal":{"name":"2022 IEEE Custom Integrated Circuits Conference (CICC)","volume":"56 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Custom Integrated Circuits Conference (CICC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICC53496.2022.9772826","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

Abstract

Process-In-Memory (PIM) is a promising solution to alleviating the memory-wall bottleneck in memory-intensive applications like CNNs. Recent demonstrations of SRAM-based PIM designs, particularly those computing in the charge domain [1]–[5], have greatly improved the linearity of analog multiply-and-add computations (MAC) and quantization, and their robustness to process variations, making their inference accuracy approach that of digital hardware in practical computer vision benchmarks such as CIFAR-10. However, there remain several limitations towards large scale integration of PIM macros, especially the assumptions on the availability of powerful external reference voltage drivers and the lack of scaling friendly designs. More specifically, high-bandwidth analog buffers driving large output load are necessary to distribute the massive number of analog signals (e.g. DAC outputs) across the macro, without sacrificing signal fidelity and computing speed. [10] is one work that reports its DAC drivers occupying 11.4% of the macro area and incurring 94-pJ energy overhead in 28 nm, accounting for 68.5% of the total energy in a macro supporting $5\mathrm{b}$ activations and $8\mathrm{b}$ weight. Second, SAR ADCs are popular for the common 5–9 bit resolution range. High-speed power-hungry analog buffers are required in conventional SAR ADCs to drive the capacitive DACs (CDACs) to reference voltages, with short settling time and high accuracy. Given the hundreds of ADCs in each macro, the design complexity and overheads incurred by these drivers are dominant. Our simulated reference driver takes 2.9-pJ energy in 65 nm, which is comparable to an ADC (e.g. 3.56 $\text{pJ}$ in [12]). Third, it is challenging to fit any conventional $\geq 7\mathrm{b}$ SAR ADC into the narrow width of SRAM cells due to the bulky CDACs and layout matching requirements, ultimately limiting the computing parallelism and energy amortization.
DCT-RAM:具有多比特电荷域计算和时域量化的无驱动进程内存8T SRAM宏
内存中进程(PIM)是一种很有前途的解决方案,可以缓解像cnn这样的内存密集型应用程序中的内存墙瓶颈。最近基于sram的PIM设计的演示,特别是在电荷域[1]-[5]的计算,极大地提高了模拟乘法和加法计算(MAC)和量化的线性度,以及它们对过程变化的鲁棒性,使其推理精度接近实际计算机视觉基准中的数字硬件,如CIFAR-10。然而,PIM宏的大规模集成仍然存在一些限制,特别是对强大的外部参考电压驱动器的可用性的假设以及缺乏缩放友好的设计。更具体地说,在不牺牲信号保真度和计算速度的情况下,需要高带宽模拟缓冲驱动大输出负载,以便在宏中分配大量模拟信号(例如DAC输出)。[10]是一个报告其DAC驱动程序占用11.4的工作% of the macro area and incurring 94-pJ energy overhead in 28 nm, accounting for 68.5% of the total energy in a macro supporting $5\mathrm{b}$ activations and $8\mathrm{b}$ weight. Second, SAR ADCs are popular for the common 5–9 bit resolution range. High-speed power-hungry analog buffers are required in conventional SAR ADCs to drive the capacitive DACs (CDACs) to reference voltages, with short settling time and high accuracy. Given the hundreds of ADCs in each macro, the design complexity and overheads incurred by these drivers are dominant. Our simulated reference driver takes 2.9-pJ energy in 65 nm, which is comparable to an ADC (e.g. 3.56 $\text{pJ}$ in [12]). Third, it is challenging to fit any conventional $\geq 7\mathrm{b}$ SAR ADC into the narrow width of SRAM cells due to the bulky CDACs and layout matching requirements, ultimately limiting the computing parallelism and energy amortization.
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信