Approximation-Aware Training for Efficient Neural Network Inference on MRAM Based CiM Architecture

IF 1.8 Q3 MATERIALS SCIENCE, MULTIDISCIPLINARY
Hemkant Nehete;Sandeep Soni;Tharun Kumar Reddy Bollu;Balasubramanian Raman;Brajesh Kumar Kaushik
{"title":"Approximation-Aware Training for Efficient Neural Network Inference on MRAM Based CiM Architecture","authors":"Hemkant Nehete;Sandeep Soni;Tharun Kumar Reddy Bollu;Balasubramanian Raman;Brajesh Kumar Kaushik","doi":"10.1109/OJNANO.2024.3524265","DOIUrl":null,"url":null,"abstract":"Convolutional neural networks (CNNs), despite their broad applications, are constrained by high computational and memory requirements. Existing compression techniques often neglect approximation errors incurred during training. This work proposes approximation-aware-training, in which group of weights are approximated using a differential approximation function, resulting in a new weight matrix composed of approximation function's coefficients (AFC). The network is trained using backpropagation to minimize the loss function with respect to AFC matrix with linear and quadratic approximation functions preserving accuracy at high compression rates. This work extends to implement an compute-in-memory architecture for inference operations of approximate neural networks. This architecture includes a mapping algorithm that modulates inputs and map AFC to crossbar arrays directly, eliminating the need to predict approximated weights for evaluating output. This reduces the number of crossbars, lowering area and energy consumption. Integrating magnetic random-access memory-based devices further enhances performance by reducing latency and energy consumption. Simulation results on approximated LeNet-5, VGG8, AlexNet, and ResNet18 models trained on the CIFAR-100 dataset showed reductions of 54%, 30%, 67%, and 20% in the total number of crossbars, respectively, resulting in improved area efficiency. In the ResNet18 architecture, latency and energy consumption decreased by 95% and 93.3% with spin-orbit torque (SOT) based crossbars compared to RRAM-based architectures.","PeriodicalId":446,"journal":{"name":"IEEE Open Journal of Nanotechnology","volume":"6 ","pages":"16-26"},"PeriodicalIF":1.8000,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10819260","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of Nanotechnology","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10819260/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Convolutional neural networks (CNNs), despite their broad applications, are constrained by high computational and memory requirements. Existing compression techniques often neglect approximation errors incurred during training. This work proposes approximation-aware-training, in which group of weights are approximated using a differential approximation function, resulting in a new weight matrix composed of approximation function's coefficients (AFC). The network is trained using backpropagation to minimize the loss function with respect to AFC matrix with linear and quadratic approximation functions preserving accuracy at high compression rates. This work extends to implement an compute-in-memory architecture for inference operations of approximate neural networks. This architecture includes a mapping algorithm that modulates inputs and map AFC to crossbar arrays directly, eliminating the need to predict approximated weights for evaluating output. This reduces the number of crossbars, lowering area and energy consumption. Integrating magnetic random-access memory-based devices further enhances performance by reducing latency and energy consumption. Simulation results on approximated LeNet-5, VGG8, AlexNet, and ResNet18 models trained on the CIFAR-100 dataset showed reductions of 54%, 30%, 67%, and 20% in the total number of crossbars, respectively, resulting in improved area efficiency. In the ResNet18 architecture, latency and energy consumption decreased by 95% and 93.3% with spin-orbit torque (SOT) based crossbars compared to RRAM-based architectures.
基于MRAM的CiM结构中高效神经网络推理的逼近感知训练
卷积神经网络(Convolutional neural networks, cnn)虽然有着广泛的应用,但其计算量和存储能力的要求较高。现有的压缩技术往往忽略了训练过程中产生的近似误差。本文提出了一种近似感知训练方法,利用微分近似函数对一组权重进行近似,得到由近似函数系数组成的新的权重矩阵(AFC)。该网络使用反向传播方法进行训练,以最小化相对于AFC矩阵的损失函数,并使用线性和二次逼近函数在高压缩率下保持精度。这项工作扩展到实现近似神经网络推理操作的内存计算架构。该架构包括一个映射算法,该算法可以调制输入并将AFC直接映射到交叉棒阵列,从而消除了为评估输出而预测近似权重的需要。这减少了横梁的数量,降低了面积和能耗。集成基于磁性随机存取存储器的设备通过减少延迟和能耗进一步提高了性能。在CIFAR-100数据集上训练的LeNet-5、VGG8、AlexNet和ResNet18模型上的仿真结果表明,交叉条总数分别减少了54%、30%、67%和20%,从而提高了区域效率。在ResNet18架构中,与基于rram的架构相比,基于自旋轨道扭矩(SOT)的交叉棒的延迟和能耗分别降低了95%和93.3%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
3.90
自引率
17.60%
发文量
10
审稿时长
12 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信