Analog memory-based techniques for accelerating the training of fully-connected deep neural networks (Conference Presentation)

Novel Patterning Technologies for Semiconductors, MEMS/NEMS, and MOEMS 2019 Pub Date : 2019-08-16 DOI:10.1117/12.2515630

H. Tsai, S. Ambrogio, P. Narayanan, R. Shelby, C. Mackin, G. Burr

{"title":"Analog memory-based techniques for accelerating the training of fully-connected deep neural networks (Conference Presentation)","authors":"H. Tsai, S. Ambrogio, P. Narayanan, R. Shelby, C. Mackin, G. Burr","doi":"10.1117/12.2515630","DOIUrl":null,"url":null,"abstract":"Crossbar arrays of resistive non-volatile memories (NVMs) offer a novel and innovative solution for deep learning tasks which are typically implemented on GPUs [1]. The highly parallel structure employed in these architectures enables fast and energy-efficient multiply-accumulate computations, which is the workhorse of most deep learning algorithms. More specifically, we are developing analog hardware platforms for acceleration of large Fully Connected (FC) Deep Neural Networks (DNNs) [1,2], where training is performed using the backpropagation algorithm. This algorithm is a supervised form of learning based on three steps: forward propagation of input data through the network (a.k.a. forward inference), comparison of the inference results with ground truth labels and backpropagation of the errors from the output to the input layer, and then in-situ weight updates. This type of supervised training has been shown to succeed even in the presence of a substantial number of faulty NVMs, relaxing yield requirements vis-a-vis conventional memory, where near 100% yield may be required [2]. \nWe recently surveyed the use of analog memory devices for DNN hardware accelerators based on crossbar array structures and discussed design choices, device and circuit readiness, and the most promising opportunities compared digital accelerators [3]. In this presentation, we will focus on our implementation of an analog memory cell based on Phase-Change Memory (PCM) and 3-Transistor 1-Capacitor (3T1C) [4]. Software-equivalent accuracy on various datasets (MNIST, MNIST with noise, CIFAR-10, CIFAR-100) was achieved in a mixed software-hardware demonstration with DNN weights stored in real PCM device arrays as analog conductances. We will discuss how limitations from real-world non-volatile memory (NVM), such as conductance linearity and variability affects DNN training and how using two pairs of analog weights with varying significance relaxes device requirements [5, 6, 7]. Finally, we summarize all pieces needed to build an analog accelerator chip [8] and how lithography plays a role in future development of novel NVM devices.\n\nReferences:\n[1] G. W. Burr et al., “Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses), using phase-change memory as the synaptic weight element” IEDM Tech. Digest, 29.5 (2014). \n[2] G. W. Burr et al., “Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses) using phase-change memory as the synaptic weight element”, IEEE Trans. Elec. Dev, 62(11), pp. 3498 (2015).\n[3] H. Tsai et al., “Recent progress in analog memory-based accelerators for deep learning”, Journal of Physics D: Applied Physics, 51 (28), 283001 (2018)\n[4] S. Ambrogio et al., “Equivalent-Accuracy Accelerated Neural Network Training using Analog Memory”, Nature, 558 (7708), 60 (2018).\n[5] T. Gokmen et al., “Acceleration of deep neural network training with resistive cross-point devices: design considerations”, Frontiers in neuroscience, 10, 333 (2016).\n[6] S. Sidler et al., “Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: impact of conductance response”, ESSDERC Proc., 440 (2016).\n[7] G. Cristiano et al., “Perspective on Training Fully Connected Networks with Resistive Memories: Device Requirements for Multiple Conductances of Varying Significance”, accepted in Journal of Applied Physics (2018).\n[8] P. Narayanan et al., “Toward on-chip acceleration of the backpropagation algorithm using nonvolatile memory”, IBM J. Res. Dev., 61 (4), 1-11 (2017).","PeriodicalId":360316,"journal":{"name":"Novel Patterning Technologies for Semiconductors, MEMS/NEMS, and MOEMS 2019","volume":"40 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Novel Patterning Technologies for Semiconductors, MEMS/NEMS, and MOEMS 2019","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1117/12.2515630","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Crossbar arrays of resistive non-volatile memories (NVMs) offer a novel and innovative solution for deep learning tasks which are typically implemented on GPUs [1]. The highly parallel structure employed in these architectures enables fast and energy-efficient multiply-accumulate computations, which is the workhorse of most deep learning algorithms. More specifically, we are developing analog hardware platforms for acceleration of large Fully Connected (FC) Deep Neural Networks (DNNs) [1,2], where training is performed using the backpropagation algorithm. This algorithm is a supervised form of learning based on three steps: forward propagation of input data through the network (a.k.a. forward inference), comparison of the inference results with ground truth labels and backpropagation of the errors from the output to the input layer, and then in-situ weight updates. This type of supervised training has been shown to succeed even in the presence of a substantial number of faulty NVMs, relaxing yield requirements vis-a-vis conventional memory, where near 100% yield may be required [2]. We recently surveyed the use of analog memory devices for DNN hardware accelerators based on crossbar array structures and discussed design choices, device and circuit readiness, and the most promising opportunities compared digital accelerators [3]. In this presentation, we will focus on our implementation of an analog memory cell based on Phase-Change Memory (PCM) and 3-Transistor 1-Capacitor (3T1C) [4]. Software-equivalent accuracy on various datasets (MNIST, MNIST with noise, CIFAR-10, CIFAR-100) was achieved in a mixed software-hardware demonstration with DNN weights stored in real PCM device arrays as analog conductances. We will discuss how limitations from real-world non-volatile memory (NVM), such as conductance linearity and variability affects DNN training and how using two pairs of analog weights with varying significance relaxes device requirements [5, 6, 7]. Finally, we summarize all pieces needed to build an analog accelerator chip [8] and how lithography plays a role in future development of novel NVM devices. References: [1] G. W. Burr et al., “Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses), using phase-change memory as the synaptic weight element” IEDM Tech. Digest, 29.5 (2014). [2] G. W. Burr et al., “Experimental demonstration and tolerancing of a large-scale neural network (165,000 synapses) using phase-change memory as the synaptic weight element”, IEEE Trans. Elec. Dev, 62(11), pp. 3498 (2015). [3] H. Tsai et al., “Recent progress in analog memory-based accelerators for deep learning”, Journal of Physics D: Applied Physics, 51 (28), 283001 (2018) [4] S. Ambrogio et al., “Equivalent-Accuracy Accelerated Neural Network Training using Analog Memory”, Nature, 558 (7708), 60 (2018). [5] T. Gokmen et al., “Acceleration of deep neural network training with resistive cross-point devices: design considerations”, Frontiers in neuroscience, 10, 333 (2016). [6] S. Sidler et al., “Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: impact of conductance response”, ESSDERC Proc., 440 (2016). [7] G. Cristiano et al., “Perspective on Training Fully Connected Networks with Resistive Memories: Device Requirements for Multiple Conductances of Varying Significance”, accepted in Journal of Applied Physics (2018). [8] P. Narayanan et al., “Toward on-chip acceleration of the backpropagation algorithm using nonvolatile memory”, IBM J. Res. Dev., 61 (4), 1-11 (2017).

查看原文本刊更多论文

基于模拟记忆的加速全连接深度神经网络训练技术(会议报告)

电阻性非易失性存储器(NVMs)的交叉棒阵列为通常在gpu上实现的深度学习任务提供了一种新颖和创新的解决方案[1]。在这些架构中采用的高度并行结构实现了快速和节能的乘法累积计算，这是大多数深度学习算法的主力。更具体地说，我们正在开发用于加速大型全连接(FC)深度神经网络(dnn)的模拟硬件平台[1,2]，其中使用反向传播算法执行训练。该算法是一种基于三个步骤的监督式学习:输入数据通过网络前向传播(也称为前向推理)，将推理结果与地面真值标签进行比较，并将误差从输出层反向传播到输入层，然后进行原位权重更新。这种类型的监督训练已被证明即使在存在大量有缺陷的nvm的情况下也能成功，与传统存储器相比，它的良率要求较低，在传统存储器中，可能需要接近100%的良率[2]。我们最近调查了基于交叉棒阵列结构的DNN硬件加速器的模拟存储器件的使用情况，并讨论了设计选择、器件和电路准备情况，以及与数字加速器相比最有希望的机会[3]。在本次演讲中，我们将重点介绍基于相变存储器(PCM)和3-晶体管1-电容器(3T1C)的模拟存储单元的实现[4]。在不同数据集(MNIST，带噪声的MNIST, CIFAR-10, CIFAR-100)上实现了软件等效精度，并将DNN权重存储在实际PCM器件阵列中作为模拟电导。我们将讨论现实世界的非易失性存储器(NVM)的限制，如电导线性和可变性如何影响DNN训练，以及如何使用两对具有不同显著性的模拟权重放宽设备要求[5,6,7]。最后，我们总结了构建模拟加速器芯片所需的所有部分[8]，以及光刻技术如何在新型NVM设备的未来发展中发挥作用。[1] G. W. Burr et al.，“大规模神经网络(165,000个突触)的实验演示和容限，使用相变记忆作为突触权重元素”，IEDM Tech Digest, 29.5(2014)。[2]王志强，“基于相位记忆的神经网络的实验研究”，计算机科学与工程学报。电子开发，62(11)，pp. 3498 (2015).[3]T. Gokmen等，“基于电阻交叉点设备的深度神经网络训练加速:设计考虑”，神经科学前沿，10,333 (2016).[6]S. Sidler等，“以非易失性记忆为突触权重元素的大规模神经网络:电导响应的影响”，esderc Proc, 440 (2016).[7]G. Cristiano等，“基于电阻性记忆的全连接网络训练:不同意义的多电导器件要求”，《应用物理学报》(2018).[8]P. Narayanan等，“基于非易失性存储器的反向传播算法的片上加速”，IBM J. Res. Dev. 61(4)， 1-11(2017)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Novel Patterning Technologies for Semiconductors, MEMS/NEMS, and MOEMS 2019

自引率

0.00%

发文量