Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks

Rui Liu, Xiaochen Peng, Xiaoyu Sun, W. Khwa, Xin Si, Jia-Jing Chen, Jia-Fang Li, Meng-Fan Chang, Shimeng Yu
{"title":"Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks","authors":"Rui Liu, Xiaochen Peng, Xiaoyu Sun, W. Khwa, Xin Si, Jia-Jing Chen, Jia-Fang Li, Meng-Fan Chang, Shimeng Yu","doi":"10.1145/3195970.3196089","DOIUrl":null,"url":null,"abstract":"Recent advances in deep neural networks (DNNs) have shown Binary Neural Networks (BNNs) are able to provide a reasonable accuracy on various image datasets with a significant reduction in computation and memory cost. In this paper, we explore two BNNs: hybrid BNN (HBNN) and XNOR-BNN, where the weights are binarized to +1/−1 while the neuron activations are binarized to 1/0 and +1/−1, respectively. Two SRAM bit cell designs are proposed, namely, 6T SRAM for HBNN and customized 8T SRAM for XNOR-BNN. In our design, the high-precision multiply-and-accumulate (MAC) is replaced by bitwise multiplication for HBNN or XNOR for XNOR-BNN plus bit-counting operations. To parallelize the weighted sum operation, we activate multiple word lines in the SRAM array simultaneously and digitize the analog voltage developed along the bit line by a multi-level sense amplifier (MLSA). In order to partition the large matrices in DNNs, we investigate the impact of sensing bit-levels of MLSA on the accuracy degradation for different sub-array sizes and propose using the nonlinear quantization technique to mitigate the accuracy degradation. With 64 × 64 sub-array size and 3-bit MLSA, HBNN and XNOR-BNN architectures can minimize the accuracy degradation to 2.37% and 0.88%, respectively, for an inspired VGG-16 network on the CIFAR-10 dataset. Design space exploration of SRAM based synaptic architectures with the conventional row-by-row access scheme and our proposed parallel access scheme are also performed, showing significant benefits in the area, latency and energy-efficiency. Finally, we have successfully taped-out and validated the proposed HBNN and XNOR-BNN designs in TSMC 65 nm process with measured silicon data, achieving energy-efficiency >100 TOPS/W for HBNN and >50 TOPS/W for XNOR-BNN.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"96 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"60","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3195970.3196089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 60

Abstract

Recent advances in deep neural networks (DNNs) have shown Binary Neural Networks (BNNs) are able to provide a reasonable accuracy on various image datasets with a significant reduction in computation and memory cost. In this paper, we explore two BNNs: hybrid BNN (HBNN) and XNOR-BNN, where the weights are binarized to +1/−1 while the neuron activations are binarized to 1/0 and +1/−1, respectively. Two SRAM bit cell designs are proposed, namely, 6T SRAM for HBNN and customized 8T SRAM for XNOR-BNN. In our design, the high-precision multiply-and-accumulate (MAC) is replaced by bitwise multiplication for HBNN or XNOR for XNOR-BNN plus bit-counting operations. To parallelize the weighted sum operation, we activate multiple word lines in the SRAM array simultaneously and digitize the analog voltage developed along the bit line by a multi-level sense amplifier (MLSA). In order to partition the large matrices in DNNs, we investigate the impact of sensing bit-levels of MLSA on the accuracy degradation for different sub-array sizes and propose using the nonlinear quantization technique to mitigate the accuracy degradation. With 64 × 64 sub-array size and 3-bit MLSA, HBNN and XNOR-BNN architectures can minimize the accuracy degradation to 2.37% and 0.88%, respectively, for an inspired VGG-16 network on the CIFAR-10 dataset. Design space exploration of SRAM based synaptic architectures with the conventional row-by-row access scheme and our proposed parallel access scheme are also performed, showing significant benefits in the area, latency and energy-efficiency. Finally, we have successfully taped-out and validated the proposed HBNN and XNOR-BNN designs in TSMC 65 nm process with measured silicon data, achieving energy-efficiency >100 TOPS/W for HBNN and >50 TOPS/W for XNOR-BNN.
用于二进制神经网络的自定义位单元并行化SRAM阵列
深度神经网络(dnn)的最新进展表明,二进制神经网络(bnn)能够在各种图像数据集上提供合理的精度,同时显著降低计算和内存成本。在本文中,我们研究了两种BNN:混合BNN (HBNN)和XNOR-BNN,其中权重二值化为+1/−1,神经元激活分别二值化为1/0和+1/−1。提出了两种SRAM位单元设计,即用于HBNN的6T SRAM和用于XNOR-BNN的定制8T SRAM。在我们的设计中,HBNN的高精度乘法累加(MAC)被逐位乘法取代,XNOR (XNOR- bnn加位计数操作)被逐位乘法取代。为了使加权和运算并行化,我们同时激活SRAM阵列中的多个字线,并通过多级感测放大器(MLSA)将沿位线产生的模拟电压数字化。为了对dnn中的大矩阵进行划分,研究了MLSA的感知位水平对不同子阵列大小下精度退化的影响,并提出了采用非线性量化技术来缓解精度退化的方法。对于CIFAR-10数据集上的VGG-16网络,在64 × 64子阵列大小和3位MLSA的情况下,HBNN和XNOR-BNN架构可以将精度降低到分别为2.37%和0.88%。采用传统的逐行访问方案和我们提出的并行访问方案对基于SRAM的突触架构进行了设计空间探索,显示出在面积、延迟和能效方面的显着优势。最后,我们成功地在台积电65nm工艺中对HBNN和XNOR-BNN设计进行了测试和验证,HBNN的能效>100 TOPS/W, XNOR-BNN的能效>50 TOPS/W。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信