Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC) Pub Date : 2018-06-01 DOI:10.1145/3195970.3196089

Rui Liu, Xiaochen Peng, Xiaoyu Sun, W. Khwa, Xin Si, Jia-Jing Chen, Jia-Fang Li, Meng-Fan Chang, Shimeng Yu

{"title":"Parallelizing SRAM Arrays with Customized Bit-Cell for Binary Neural Networks","authors":"Rui Liu, Xiaochen Peng, Xiaoyu Sun, W. Khwa, Xin Si, Jia-Jing Chen, Jia-Fang Li, Meng-Fan Chang, Shimeng Yu","doi":"10.1145/3195970.3196089","DOIUrl":null,"url":null,"abstract":"Recent advances in deep neural networks (DNNs) have shown Binary Neural Networks (BNNs) are able to provide a reasonable accuracy on various image datasets with a significant reduction in computation and memory cost. In this paper, we explore two BNNs: hybrid BNN (HBNN) and XNOR-BNN, where the weights are binarized to +1/−1 while the neuron activations are binarized to 1/0 and +1/−1, respectively. Two SRAM bit cell designs are proposed, namely, 6T SRAM for HBNN and customized 8T SRAM for XNOR-BNN. In our design, the high-precision multiply-and-accumulate (MAC) is replaced by bitwise multiplication for HBNN or XNOR for XNOR-BNN plus bit-counting operations. To parallelize the weighted sum operation, we activate multiple word lines in the SRAM array simultaneously and digitize the analog voltage developed along the bit line by a multi-level sense amplifier (MLSA). In order to partition the large matrices in DNNs, we investigate the impact of sensing bit-levels of MLSA on the accuracy degradation for different sub-array sizes and propose using the nonlinear quantization technique to mitigate the accuracy degradation. With 64 × 64 sub-array size and 3-bit MLSA, HBNN and XNOR-BNN architectures can minimize the accuracy degradation to 2.37% and 0.88%, respectively, for an inspired VGG-16 network on the CIFAR-10 dataset. Design space exploration of SRAM based synaptic architectures with the conventional row-by-row access scheme and our proposed parallel access scheme are also performed, showing significant benefits in the area, latency and energy-efficiency. Finally, we have successfully taped-out and validated the proposed HBNN and XNOR-BNN designs in TSMC 65 nm process with measured silicon data, achieving energy-efficiency >100 TOPS/W for HBNN and >50 TOPS/W for XNOR-BNN.","PeriodicalId":6491,"journal":{"name":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","volume":"96 1","pages":"1-6"},"PeriodicalIF":0.0000,"publicationDate":"2018-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"60","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3195970.3196089","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 60

Abstract

Recent advances in deep neural networks (DNNs) have shown Binary Neural Networks (BNNs) are able to provide a reasonable accuracy on various image datasets with a significant reduction in computation and memory cost. In this paper, we explore two BNNs: hybrid BNN (HBNN) and XNOR-BNN, where the weights are binarized to +1/−1 while the neuron activations are binarized to 1/0 and +1/−1, respectively. Two SRAM bit cell designs are proposed, namely, 6T SRAM for HBNN and customized 8T SRAM for XNOR-BNN. In our design, the high-precision multiply-and-accumulate (MAC) is replaced by bitwise multiplication for HBNN or XNOR for XNOR-BNN plus bit-counting operations. To parallelize the weighted sum operation, we activate multiple word lines in the SRAM array simultaneously and digitize the analog voltage developed along the bit line by a multi-level sense amplifier (MLSA). In order to partition the large matrices in DNNs, we investigate the impact of sensing bit-levels of MLSA on the accuracy degradation for different sub-array sizes and propose using the nonlinear quantization technique to mitigate the accuracy degradation. With 64 × 64 sub-array size and 3-bit MLSA, HBNN and XNOR-BNN architectures can minimize the accuracy degradation to 2.37% and 0.88%, respectively, for an inspired VGG-16 network on the CIFAR-10 dataset. Design space exploration of SRAM based synaptic architectures with the conventional row-by-row access scheme and our proposed parallel access scheme are also performed, showing significant benefits in the area, latency and energy-efficiency. Finally, we have successfully taped-out and validated the proposed HBNN and XNOR-BNN designs in TSMC 65 nm process with measured silicon data, achieving energy-efficiency >100 TOPS/W for HBNN and >50 TOPS/W for XNOR-BNN.

查看原文本刊更多论文

用于二进制神经网络的自定义位单元并行化SRAM阵列

深度神经网络(dnn)的最新进展表明，二进制神经网络(bnn)能够在各种图像数据集上提供合理的精度，同时显著降低计算和内存成本。在本文中，我们研究了两种BNN:混合BNN (HBNN)和XNOR-BNN，其中权重二值化为+1/−1，神经元激活分别二值化为1/0和+1/−1。提出了两种SRAM位单元设计，即用于HBNN的6T SRAM和用于XNOR-BNN的定制8T SRAM。在我们的设计中，HBNN的高精度乘法累加(MAC)被逐位乘法取代，XNOR (XNOR- bnn加位计数操作)被逐位乘法取代。为了使加权和运算并行化，我们同时激活SRAM阵列中的多个字线，并通过多级感测放大器(MLSA)将沿位线产生的模拟电压数字化。为了对dnn中的大矩阵进行划分，研究了MLSA的感知位水平对不同子阵列大小下精度退化的影响，并提出了采用非线性量化技术来缓解精度退化的方法。对于CIFAR-10数据集上的VGG-16网络，在64 × 64子阵列大小和3位MLSA的情况下，HBNN和XNOR-BNN架构可以将精度降低到分别为2.37%和0.88%。采用传统的逐行访问方案和我们提出的并行访问方案对基于SRAM的突触架构进行了设计空间探索，显示出在面积、延迟和能效方面的显着优势。最后，我们成功地在台积电65nm工艺中对HBNN和XNOR-BNN设计进行了测试和验证，HBNN的能效>100 TOPS/W, XNOR-BNN的能效>50 TOPS/W。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC)

自引率

0.00%

发文量