Achieving High In Situ Training Accuracy and Energy Efficiency with Analog Non-Volatile Synaptic Devices

ACM Trans. Design Autom. Electr. Syst. Pub Date : 2022-07-31 DOI:10.1145/3500929

Shanshi Huang, Xiaoyu Sun, Xiaochen Peng, Hongwu Jiang, Shimeng Yu

{"title":"Achieving High In Situ Training Accuracy and Energy Efficiency with Analog Non-Volatile Synaptic Devices","authors":"Shanshi Huang, Xiaoyu Sun, Xiaochen Peng, Hongwu Jiang, Shimeng Yu","doi":"10.1145/3500929","DOIUrl":null,"url":null,"abstract":"On-device embedded artificial intelligence prefers the adaptive learning capability when deployed in the field, and thus in situ training is required. The compute-in-memory approach, which exploits the analog computation within the memory array, is a promising solution for deep neural network (DNN) on-chip acceleration. Emerging non-volatile memories are of great interest, serving as analog synapses due to their multilevel programmability. However, the asymmetry and nonlinearity in the conductance tuning remain grand challenges for achieving high in situ training accuracy. In addition, analog-to-digital converters at the edge of the memory array introduce quantization errors. In this work, we present an algorithm-hardware co-optimization to overcome these challenges. We incorporate the device/circuit non-ideal effects into the DNN propagation and weight update steps. By introducing the adaptive “momentum” in the weight update rule, in situ training accuracy on CIFAR-10 could approach its software baseline even under severe asymmetry/nonlinearity and analog-to-digital converter quantization error. The hardware performance of the on-chip training architecture and the overhead for adding “momentum” are also evaluated. By optimizing the backpropagation dataflow, 23.59 TOPS/W training energy efficiency (12× improvement compared to naïve dataflow) is achieved. The circuits that handle “momentum” introduce only 4.2% energy overhead. Our results show great potential and more relaxed requirements that enable emerging non-volatile memories for DNN acceleration on the embedded artificial intelligence platforms.","PeriodicalId":7063,"journal":{"name":"ACM Trans. Design Autom. Electr. Syst.","volume":"121 1","pages":"37:1-37:19"},"PeriodicalIF":0.0000,"publicationDate":"2022-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Trans. Design Autom. Electr. Syst.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3500929","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

On-device embedded artificial intelligence prefers the adaptive learning capability when deployed in the field, and thus in situ training is required. The compute-in-memory approach, which exploits the analog computation within the memory array, is a promising solution for deep neural network (DNN) on-chip acceleration. Emerging non-volatile memories are of great interest, serving as analog synapses due to their multilevel programmability. However, the asymmetry and nonlinearity in the conductance tuning remain grand challenges for achieving high in situ training accuracy. In addition, analog-to-digital converters at the edge of the memory array introduce quantization errors. In this work, we present an algorithm-hardware co-optimization to overcome these challenges. We incorporate the device/circuit non-ideal effects into the DNN propagation and weight update steps. By introducing the adaptive “momentum” in the weight update rule, in situ training accuracy on CIFAR-10 could approach its software baseline even under severe asymmetry/nonlinearity and analog-to-digital converter quantization error. The hardware performance of the on-chip training architecture and the overhead for adding “momentum” are also evaluated. By optimizing the backpropagation dataflow, 23.59 TOPS/W training energy efficiency (12× improvement compared to naïve dataflow) is achieved. The circuits that handle “momentum” introduce only 4.2% energy overhead. Our results show great potential and more relaxed requirements that enable emerging non-volatile memories for DNN acceleration on the embedded artificial intelligence platforms.

查看原文本刊更多论文

利用模拟非易失性突触器件实现高原位训练精度和能量效率

设备内嵌入式人工智能在现场部署时更倾向于自适应学习能力，因此需要现场培训。利用存储器阵列内模拟计算的内存计算方法是实现深度神经网络(DNN)片上加速的一种很有前途的解决方案。新兴的非易失性存储器由于其多层可编程性而被用作模拟突触，引起了人们的极大兴趣。然而，电导调谐中的不对称性和非线性仍然是实现高原位训练精度的巨大挑战。此外，存储阵列边缘的模数转换器会引入量化误差。在这项工作中，我们提出了一种算法-硬件协同优化来克服这些挑战。我们将器件/电路非理想效应纳入深度神经网络传播和权值更新步骤中。通过在权重更新规则中引入自适应“动量”，即使在严重的不对称/非线性和模数转换器量化误差下，CIFAR-10的原位训练精度也能接近其软件基线。还评估了片上训练架构的硬件性能和增加“动量”的开销。通过对反向传播数据流的优化，达到23.59 TOPS/W的训练能量效率(比naïve数据流提高12倍)。处理“动量”的电路只引入了4.2%的能量开销。我们的研究结果显示了巨大的潜力和更宽松的要求，使嵌入式人工智能平台上DNN加速的非易失性存储器成为可能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Trans. Design Autom. Electr. Syst.

自引率

0.00%

发文量