Ethan G. Weinstock, Yiming Tan, Wantong Li, Shimeng Yu
{"title":"40 nm RRAM模拟内存中计算加速器的机器学习算法协同设计","authors":"Ethan G. Weinstock, Yiming Tan, Wantong Li, Shimeng Yu","doi":"10.1109/ORSS58323.2023.10161812","DOIUrl":null,"url":null,"abstract":"Resistive Random-Access Memory (RRAM)-based analog compute-in-memory (CIM) chips can be used to accelerate deep neural network (DNN) inference in low power, resource-constrained edge-devices. In this work, we present software algorithm co-design techniques to optimize DNN inference on RRAM-based CIM accelerators for a hand-written digit recognition workload. Our approach involves three key steps: 1) an integer-based training framework that optimizes raw DNN output for low bit-width networks; 2) a row rename step that leverages low weight variance in low-precision networks to reduce network storage requirements; and 3) a row reorder step that leverages the spatial nature of sparsity-aware input control to reduce the effect of analog noise on error rate. We apply our techniques to train a LeNet-5 model using 1-bit weights and 2-bit activations to 96.66% accuracy on the MNIST dataset, reduce area of the final fully connected layer by 45.24%, and reduce the impact of analog noise in the final layer on inference accuracy by 5.6%.","PeriodicalId":263086,"journal":{"name":"2023 IEEE International Opportunity Research Scholars Symposium (ORSS)","volume":"430 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine Learning Algorithm Co-Design for a 40 nm RRAM Analog Compute-in-Memory Accelerator\",\"authors\":\"Ethan G. Weinstock, Yiming Tan, Wantong Li, Shimeng Yu\",\"doi\":\"10.1109/ORSS58323.2023.10161812\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Resistive Random-Access Memory (RRAM)-based analog compute-in-memory (CIM) chips can be used to accelerate deep neural network (DNN) inference in low power, resource-constrained edge-devices. In this work, we present software algorithm co-design techniques to optimize DNN inference on RRAM-based CIM accelerators for a hand-written digit recognition workload. Our approach involves three key steps: 1) an integer-based training framework that optimizes raw DNN output for low bit-width networks; 2) a row rename step that leverages low weight variance in low-precision networks to reduce network storage requirements; and 3) a row reorder step that leverages the spatial nature of sparsity-aware input control to reduce the effect of analog noise on error rate. We apply our techniques to train a LeNet-5 model using 1-bit weights and 2-bit activations to 96.66% accuracy on the MNIST dataset, reduce area of the final fully connected layer by 45.24%, and reduce the impact of analog noise in the final layer on inference accuracy by 5.6%.\",\"PeriodicalId\":263086,\"journal\":{\"name\":\"2023 IEEE International Opportunity Research Scholars Symposium (ORSS)\",\"volume\":\"430 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE International Opportunity Research Scholars Symposium (ORSS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ORSS58323.2023.10161812\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE International Opportunity Research Scholars Symposium (ORSS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ORSS58323.2023.10161812","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Machine Learning Algorithm Co-Design for a 40 nm RRAM Analog Compute-in-Memory Accelerator
Resistive Random-Access Memory (RRAM)-based analog compute-in-memory (CIM) chips can be used to accelerate deep neural network (DNN) inference in low power, resource-constrained edge-devices. In this work, we present software algorithm co-design techniques to optimize DNN inference on RRAM-based CIM accelerators for a hand-written digit recognition workload. Our approach involves three key steps: 1) an integer-based training framework that optimizes raw DNN output for low bit-width networks; 2) a row rename step that leverages low weight variance in low-precision networks to reduce network storage requirements; and 3) a row reorder step that leverages the spatial nature of sparsity-aware input control to reduce the effect of analog noise on error rate. We apply our techniques to train a LeNet-5 model using 1-bit weights and 2-bit activations to 96.66% accuracy on the MNIST dataset, reduce area of the final fully connected layer by 45.24%, and reduce the impact of analog noise in the final layer on inference accuracy by 5.6%.