Approximation-Aware Training for Efficient Neural Network Inference on MRAM Based CiM Architecture

IF 1.9 Q3 MATERIALS SCIENCE, MULTIDISCIPLINARY

IEEE Open Journal of Nanotechnology Pub Date : 2024-12-31 DOI:10.1109/OJNANO.2024.3524265

Hemkant Nehete;Sandeep Soni;Tharun Kumar Reddy Bollu;Balasubramanian Raman;Brajesh Kumar Kaushik

{"title":"Approximation-Aware Training for Efficient Neural Network Inference on MRAM Based CiM Architecture","authors":"Hemkant Nehete;Sandeep Soni;Tharun Kumar Reddy Bollu;Balasubramanian Raman;Brajesh Kumar Kaushik","doi":"10.1109/OJNANO.2024.3524265","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNNs), despite their broad applications, are constrained by high computational and memory requirements. Existing compression techniques often neglect approximation errors incurred during training. This work proposes approximation-aware-training, in which group of weights are approximated using a differential approximation function, resulting in a new weight matrix composed of approximation function's coefficients (AFC). The network is trained using backpropagation to minimize the loss function with respect to AFC matrix with linear and quadratic approximation functions preserving accuracy at high compression rates. This work extends to implement an compute-in-memory architecture for inference operations of approximate neural networks. This architecture includes a mapping algorithm that modulates inputs and map AFC to crossbar arrays directly, eliminating the need to predict approximated weights for evaluating output. This reduces the number of crossbars, lowering area and energy consumption. Integrating magnetic random-access memory-based devices further enhances performance by reducing latency and energy consumption. Simulation results on approximated LeNet-5, VGG8, AlexNet, and ResNet18 models trained on the CIFAR-100 dataset showed reductions of 54%, 30%, 67%, and 20% in the total number of crossbars, respectively, resulting in improved area efficiency. In the ResNet18 architecture, latency and energy consumption decreased by 95% and 93.3% with spin-orbit torque (SOT) based crossbars compared to RRAM-based architectures.","PeriodicalId":446,"journal":{"name":"IEEE Open Journal of Nanotechnology","volume":"6 ","pages":"16-26"},"PeriodicalIF":1.9000,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10819260","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of Nanotechnology","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10819260/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

Abstract

Convolutional neural networks (CNNs), despite their broad applications, are constrained by high computational and memory requirements. Existing compression techniques often neglect approximation errors incurred during training. This work proposes approximation-aware-training, in which group of weights are approximated using a differential approximation function, resulting in a new weight matrix composed of approximation function's coefficients (AFC). The network is trained using backpropagation to minimize the loss function with respect to AFC matrix with linear and quadratic approximation functions preserving accuracy at high compression rates. This work extends to implement an compute-in-memory architecture for inference operations of approximate neural networks. This architecture includes a mapping algorithm that modulates inputs and map AFC to crossbar arrays directly, eliminating the need to predict approximated weights for evaluating output. This reduces the number of crossbars, lowering area and energy consumption. Integrating magnetic random-access memory-based devices further enhances performance by reducing latency and energy consumption. Simulation results on approximated LeNet-5, VGG8, AlexNet, and ResNet18 models trained on the CIFAR-100 dataset showed reductions of 54%, 30%, 67%, and 20% in the total number of crossbars, respectively, resulting in improved area efficiency. In the ResNet18 architecture, latency and energy consumption decreased by 95% and 93.3% with spin-orbit torque (SOT) based crossbars compared to RRAM-based architectures.

查看原文本刊更多论文

基于MRAM的CiM结构中高效神经网络推理的逼近感知训练

卷积神经网络（Convolutional neural networks, cnn）虽然有着广泛的应用，但其计算量和存储能力的要求较高。现有的压缩技术往往忽略了训练过程中产生的近似误差。本文提出了一种近似感知训练方法，利用微分近似函数对一组权重进行近似，得到由近似函数系数组成的新的权重矩阵（AFC）。该网络使用反向传播方法进行训练，以最小化相对于AFC矩阵的损失函数，并使用线性和二次逼近函数在高压缩率下保持精度。这项工作扩展到实现近似神经网络推理操作的内存计算架构。该架构包括一个映射算法，该算法可以调制输入并将AFC直接映射到交叉棒阵列，从而消除了为评估输出而预测近似权重的需要。这减少了横梁的数量，降低了面积和能耗。集成基于磁性随机存取存储器的设备通过减少延迟和能耗进一步提高了性能。在CIFAR-100数据集上训练的LeNet-5、VGG8、AlexNet和ResNet18模型上的仿真结果表明，交叉条总数分别减少了54%、30%、67%和20%，从而提高了区域效率。在ResNet18架构中，与基于rram的架构相比，基于自旋轨道扭矩（SOT）的交叉棒的延迟和能耗分别降低了95%和93.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊