Sathwika Bavikadi, Purab Ranjan Sutradhar, Mark A. Indovina, A. Ganguly, Sai Manoj Pudukotai Dinakarrao
{"title":"POLAR: Performance-aware On-device Learning Capable Programmable Processing-in-Memory Architecture for Low-Power ML Applications","authors":"Sathwika Bavikadi, Purab Ranjan Sutradhar, Mark A. Indovina, A. Ganguly, Sai Manoj Pudukotai Dinakarrao","doi":"10.1109/DSD57027.2022.00125","DOIUrl":null,"url":null,"abstract":"Improving the performance of real-time Traffic Sign Recognition (TSR) applications using Deep Learning (DL) algorithms such as Convolutional Neural Networks (CNN) on software platforms is challenging due to the sheer computational complexity of these algorithms. In this work, we adopt a hardware-software combined approach to address this issue. We introduce a data-centric Processing-in-Memory (PIM) architecture that leverages Look-up-Table (LUT)-based processing for minimal data movement and superior performance and efficiency. Despite the superior performance, the limited available memory in PIM makes it complex to deploy deep CNNs. We propose merging CNN layers in this work to meet the limited resource constraints. One specific challenge in the TSR is the continuous change in the deployed environment, which makes a CNN model train over static data, leading to performance degradation over time. To address these challenges, we introduce a lightweight, performance-aware Generative Adversarial Network (GAN)-based on-device learning on PIM architecture. This compact CNN on PIM architecture attains data-level parallelism and reduces pipelining delays and makes it easier for on-device training and inference. Evaluation is performed on multiple state-of-the-art DL networks such as LeNet, AlexNet, ResNet using the German Traffic Sign Recognition Benchmark (GTSRB) Dataset, and the Belgium Traffic Sign Dataset (BTSD). With the proposed learning technique, it is observed to achieve maximum accuracy of 92.8% and 89.27% on GTSRB, and BTSD datasets. Also, it is observed the proposed mechanism maintains an average accuracy to be above 85% despite changes in the environment on all the CNNs deployed on the PIM accelerator.","PeriodicalId":211723,"journal":{"name":"2022 25th Euromicro Conference on Digital System Design (DSD)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 25th Euromicro Conference on Digital System Design (DSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD57027.2022.00125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Improving the performance of real-time Traffic Sign Recognition (TSR) applications using Deep Learning (DL) algorithms such as Convolutional Neural Networks (CNN) on software platforms is challenging due to the sheer computational complexity of these algorithms. In this work, we adopt a hardware-software combined approach to address this issue. We introduce a data-centric Processing-in-Memory (PIM) architecture that leverages Look-up-Table (LUT)-based processing for minimal data movement and superior performance and efficiency. Despite the superior performance, the limited available memory in PIM makes it complex to deploy deep CNNs. We propose merging CNN layers in this work to meet the limited resource constraints. One specific challenge in the TSR is the continuous change in the deployed environment, which makes a CNN model train over static data, leading to performance degradation over time. To address these challenges, we introduce a lightweight, performance-aware Generative Adversarial Network (GAN)-based on-device learning on PIM architecture. This compact CNN on PIM architecture attains data-level parallelism and reduces pipelining delays and makes it easier for on-device training and inference. Evaluation is performed on multiple state-of-the-art DL networks such as LeNet, AlexNet, ResNet using the German Traffic Sign Recognition Benchmark (GTSRB) Dataset, and the Belgium Traffic Sign Dataset (BTSD). With the proposed learning technique, it is observed to achieve maximum accuracy of 92.8% and 89.27% on GTSRB, and BTSD datasets. Also, it is observed the proposed mechanism maintains an average accuracy to be above 85% despite changes in the environment on all the CNNs deployed on the PIM accelerator.