40 nm RRAM模拟内存中计算加速器的机器学习算法协同设计

2023 IEEE International Opportunity Research Scholars Symposium (ORSS) Pub Date : 2023-04-23 DOI:10.1109/ORSS58323.2023.10161812

Ethan G. Weinstock, Yiming Tan, Wantong Li, Shimeng Yu

{"title":"40 nm RRAM模拟内存中计算加速器的机器学习算法协同设计","authors":"Ethan G. Weinstock, Yiming Tan, Wantong Li, Shimeng Yu","doi":"10.1109/ORSS58323.2023.10161812","DOIUrl":null,"url":null,"abstract":"Resistive Random-Access Memory (RRAM)-based analog compute-in-memory (CIM) chips can be used to accelerate deep neural network (DNN) inference in low power, resource-constrained edge-devices. In this work, we present software algorithm co-design techniques to optimize DNN inference on RRAM-based CIM accelerators for a hand-written digit recognition workload. Our approach involves three key steps: 1) an integer-based training framework that optimizes raw DNN output for low bit-width networks; 2) a row rename step that leverages low weight variance in low-precision networks to reduce network storage requirements; and 3) a row reorder step that leverages the spatial nature of sparsity-aware input control to reduce the effect of analog noise on error rate. We apply our techniques to train a LeNet-5 model using 1-bit weights and 2-bit activations to 96.66% accuracy on the MNIST dataset, reduce area of the final fully connected layer by 45.24%, and reduce the impact of analog noise in the final layer on inference accuracy by 5.6%.","PeriodicalId":263086,"journal":{"name":"2023 IEEE International Opportunity Research Scholars Symposium (ORSS)","volume":"430 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine Learning Algorithm Co-Design for a 40 nm RRAM Analog Compute-in-Memory Accelerator\",\"authors\":\"Ethan G. Weinstock, Yiming Tan, Wantong Li, Shimeng Yu\",\"doi\":\"10.1109/ORSS58323.2023.10161812\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Resistive Random-Access Memory (RRAM)-based analog compute-in-memory (CIM) chips can be used to accelerate deep neural network (DNN) inference in low power, resource-constrained edge-devices. In this work, we present software algorithm co-design techniques to optimize DNN inference on RRAM-based CIM accelerators for a hand-written digit recognition workload. Our approach involves three key steps: 1) an integer-based training framework that optimizes raw DNN output for low bit-width networks; 2) a row rename step that leverages low weight variance in low-precision networks to reduce network storage requirements; and 3) a row reorder step that leverages the spatial nature of sparsity-aware input control to reduce the effect of analog noise on error rate. We apply our techniques to train a LeNet-5 model using 1-bit weights and 2-bit activations to 96.66% accuracy on the MNIST dataset, reduce area of the final fully connected layer by 45.24%, and reduce the impact of analog noise in the final layer on inference accuracy by 5.6%.\",\"PeriodicalId\":263086,\"journal\":{\"name\":\"2023 IEEE International Opportunity Research Scholars Symposium (ORSS)\",\"volume\":\"430 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Opportunity Research Scholars Symposium (ORSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ORSS58323.2023.10161812\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Opportunity Research Scholars Symposium (ORSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ORSS58323.2023.10161812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

基于电阻式随机存取存储器(RRAM)的模拟内存计算(CIM)芯片可用于在低功耗、资源受限的边缘设备中加速深度神经网络(DNN)推理。在这项工作中，我们提出了软件算法协同设计技术，以优化基于rram的CIM加速器上的DNN推理，用于手写数字识别工作负载。我们的方法包括三个关键步骤:1)基于整数的训练框架，优化低位宽网络的原始DNN输出;2)行重命名步骤，利用低精度网络中的低权重方差来降低网络存储需求;3)利用稀疏感知输入控制的空间特性来降低模拟噪声对错误率的影响的行重排序步骤。我们利用我们的技术在MNIST数据集上使用1位权重和2位激活训练LeNet-5模型，准确率达到96.66%，最终完全连接层的面积减少了45.24%，最终层模拟噪声对推理精度的影响减少了5.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Machine Learning Algorithm Co-Design for a 40 nm RRAM Analog Compute-in-Memory Accelerator

Resistive Random-Access Memory (RRAM)-based analog compute-in-memory (CIM) chips can be used to accelerate deep neural network (DNN) inference in low power, resource-constrained edge-devices. In this work, we present software algorithm co-design techniques to optimize DNN inference on RRAM-based CIM accelerators for a hand-written digit recognition workload. Our approach involves three key steps: 1) an integer-based training framework that optimizes raw DNN output for low bit-width networks; 2) a row rename step that leverages low weight variance in low-precision networks to reduce network storage requirements; and 3) a row reorder step that leverages the spatial nature of sparsity-aware input control to reduce the effect of analog noise on error rate. We apply our techniques to train a LeNet-5 model using 1-bit weights and 2-bit activations to 96.66% accuracy on the MNIST dataset, reduce area of the final fully connected layer by 45.24%, and reduce the impact of analog noise in the final layer on inference accuracy by 5.6%.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2023 IEEE International Opportunity Research Scholars Symposium (ORSS)

自引率

0.00%

发文量