A Two-way SRAM Array based Accelerator for Deep Neural Network On-chip Training

2020 57th ACM/IEEE Design Automation Conference (DAC) Pub Date : 2020-07-01 DOI:10.1109/DAC18072.2020.9218524

Hongwu Jiang, Shanshi Huang, Xiaochen Peng, Jian-Wei Su, Yen-Chi Chou, Wei-Hsing Huang, Ta-Wei Liu, Ruhui Liu, Meng-Fan Chang, Shimeng Yu

{"title":"A Two-way SRAM Array based Accelerator for Deep Neural Network On-chip Training","authors":"Hongwu Jiang, Shanshi Huang, Xiaochen Peng, Jian-Wei Su, Yen-Chi Chou, Wei-Hsing Huang, Ta-Wei Liu, Ruhui Liu, Meng-Fan Chang, Shimeng Yu","doi":"10.1109/DAC18072.2020.9218524","DOIUrl":null,"url":null,"abstract":"On-chip training of large-scale deep neural networks (DNNs) is challenging due to computational complexity and resource limitation. Compute-in-memory (CIM) architecture exploits the analog computation inside the memory array to speed up the vectormatrix multiplication (VMM) and alleviate the memory bottleneck. However, existing CIM prototype chips, in particular, SRAM-based accelerators target at implementing low-precision inference engine only. In this work, we propose a two-way SRAM array design that could perform bi-directional in-memory VMM with minimum hardware overhead. A novel solution of signed number multiplication is also proposed to handle the negative input in backpropagation. We taped-out and validated proposed two-way SRAM array design in TSMC 28nm process. Based on the silicon measurement data on CIM macro, we explore the hardware performance for the entire architecture for DNN on-chip training. The experimental data shows that proposed accelerator can achieve energy efficiency of ~3.2 TOPS/W, >1000 FPS and >300 FPS for ResNet and DenseNet training on ImageNet, respectively.","PeriodicalId":428807,"journal":{"name":"2020 57th ACM/IEEE Design Automation Conference (DAC)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 57th ACM/IEEE Design Automation Conference (DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DAC18072.2020.9218524","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 16

Abstract

On-chip training of large-scale deep neural networks (DNNs) is challenging due to computational complexity and resource limitation. Compute-in-memory (CIM) architecture exploits the analog computation inside the memory array to speed up the vectormatrix multiplication (VMM) and alleviate the memory bottleneck. However, existing CIM prototype chips, in particular, SRAM-based accelerators target at implementing low-precision inference engine only. In this work, we propose a two-way SRAM array design that could perform bi-directional in-memory VMM with minimum hardware overhead. A novel solution of signed number multiplication is also proposed to handle the negative input in backpropagation. We taped-out and validated proposed two-way SRAM array design in TSMC 28nm process. Based on the silicon measurement data on CIM macro, we explore the hardware performance for the entire architecture for DNN on-chip training. The experimental data shows that proposed accelerator can achieve energy efficiency of ~3.2 TOPS/W, >1000 FPS and >300 FPS for ResNet and DenseNet training on ImageNet, respectively.

查看原文本刊更多论文

基于双向SRAM阵列的深度神经网络片上训练加速器

由于计算复杂度和资源限制，大规模深度神经网络(dnn)的片上训练具有挑战性。内存计算(CIM)架构利用内存阵列内的模拟计算来加快向量矩阵乘法(VMM)运算速度，缓解内存瓶颈。然而，现有的CIM原型芯片，特别是基于sram的加速器，只针对实现低精度的推理引擎。在这项工作中，我们提出了一种双向SRAM阵列设计，可以在最小的硬件开销下执行双向内存VMM。针对反向传播中的负输入，提出了一种新的有符号数乘法解。我们在TSMC的28nm制程上完成并验证了所提出的双向SRAM阵列设计。基于CIM宏上的硅测量数据，我们探索了DNN片上训练的整个架构的硬件性能。实验数据表明，该加速器在ImageNet上进行ResNet和DenseNet训练的能量效率分别达到~3.2 TOPS/W、>1000 FPS和>300 FPS。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 57th ACM/IEEE Design Automation Conference (DAC)

自引率

0.00%

发文量