Enabling NVM-based deep learning acceleration using nonuniform data quantization: work-in-progress

Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion Pub Date : 2017-10-15 DOI:10.1145/3125501.3125516

Hao Yan, Ethan C. Ahn, Lide Duan

{"title":"Enabling NVM-based deep learning acceleration using nonuniform data quantization: work-in-progress","authors":"Hao Yan, Ethan C. Ahn, Lide Duan","doi":"10.1145/3125501.3125516","DOIUrl":null,"url":null,"abstract":"Apart from employing a co-processor (e.g., GPU) for neural network (NN) computation, utilizing the unique characteristics of nonvolatile memories (NVM), including RRAM, phase change memory (PCM), and STT-MRAM, to accelerate NN algorithms has been extensively studied. In such approaches, input data and synaptic weights are represented using word line voltages and cell resistance, with the resulting bit line current indicating the calculation result. However, the limited number of resistance levels in a NVM cell largely reduces the algorithm data precision, thus significantly lowering the model inference accuracy. Motivated by the observation that the conventional, uniformly generated data quantization points are not equally important to the model, we propose a nonuniform data quantization scheme to better represent the model in NVM cells and minimize the inference accuracy loss. Our experimental results show that the proposed scheme can achieve highly accurate deep learning model inference using as low as only 4 bits for synaptic weight representation. This effectively enables a NVM with few cell resistance levels (e.g., STT-MRAM) to perform NN calculation, and also results in additional benefits in performance, energy, and memory storage.","PeriodicalId":259093,"journal":{"name":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3125501.3125516","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Apart from employing a co-processor (e.g., GPU) for neural network (NN) computation, utilizing the unique characteristics of nonvolatile memories (NVM), including RRAM, phase change memory (PCM), and STT-MRAM, to accelerate NN algorithms has been extensively studied. In such approaches, input data and synaptic weights are represented using word line voltages and cell resistance, with the resulting bit line current indicating the calculation result. However, the limited number of resistance levels in a NVM cell largely reduces the algorithm data precision, thus significantly lowering the model inference accuracy. Motivated by the observation that the conventional, uniformly generated data quantization points are not equally important to the model, we propose a nonuniform data quantization scheme to better represent the model in NVM cells and minimize the inference accuracy loss. Our experimental results show that the proposed scheme can achieve highly accurate deep learning model inference using as low as only 4 bits for synaptic weight representation. This effectively enables a NVM with few cell resistance levels (e.g., STT-MRAM) to perform NN calculation, and also results in additional benefits in performance, energy, and memory storage.

查看原文本刊更多论文

除了使用协处理器(如GPU)进行神经网络(NN)计算外，利用非易失性存储器(NVM)的独特特性，包括RRAM，相变存储器(PCM)和STT-MRAM，来加速神经网络算法已经得到了广泛的研究。在这种方法中，输入数据和突触权重用字线电压和单元电阻表示，由此产生的位线电流表示计算结果。然而，NVM单元中有限的电阻水平大大降低了算法的数据精度，从而大大降低了模型的推理精度。由于观察到传统的、均匀生成的数据量化点对模型并不同等重要，我们提出了一种非均匀数据量化方案，以更好地在NVM单元中表示模型并最小化推理精度损失。我们的实验结果表明，所提出的方案可以实现高度精确的深度学习模型推理，仅需4位的突触权重表示。这有效地使具有少量单元电阻水平的NVM(例如STT-MRAM)能够执行神经网络计算，并且还在性能，能量和内存存储方面带来额外的好处。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2017 International Conference on Compilers, Architectures and Synthesis for Embedded Systems Companion

自引率

0.00%

发文量