POLAR: Performance-aware On-device Learning Capable Programmable Processing-in-Memory Architecture for Low-Power ML Applications

2022 25th Euromicro Conference on Digital System Design (DSD) Pub Date : 2022-08-01 DOI:10.1109/DSD57027.2022.00125

Sathwika Bavikadi, Purab Ranjan Sutradhar, Mark A. Indovina, A. Ganguly, Sai Manoj Pudukotai Dinakarrao

{"title":"POLAR: Performance-aware On-device Learning Capable Programmable Processing-in-Memory Architecture for Low-Power ML Applications","authors":"Sathwika Bavikadi, Purab Ranjan Sutradhar, Mark A. Indovina, A. Ganguly, Sai Manoj Pudukotai Dinakarrao","doi":"10.1109/DSD57027.2022.00125","DOIUrl":null,"url":null,"abstract":"Improving the performance of real-time Traffic Sign Recognition (TSR) applications using Deep Learning (DL) algorithms such as Convolutional Neural Networks (CNN) on software platforms is challenging due to the sheer computational complexity of these algorithms. In this work, we adopt a hardware-software combined approach to address this issue. We introduce a data-centric Processing-in-Memory (PIM) architecture that leverages Look-up-Table (LUT)-based processing for minimal data movement and superior performance and efficiency. Despite the superior performance, the limited available memory in PIM makes it complex to deploy deep CNNs. We propose merging CNN layers in this work to meet the limited resource constraints. One specific challenge in the TSR is the continuous change in the deployed environment, which makes a CNN model train over static data, leading to performance degradation over time. To address these challenges, we introduce a lightweight, performance-aware Generative Adversarial Network (GAN)-based on-device learning on PIM architecture. This compact CNN on PIM architecture attains data-level parallelism and reduces pipelining delays and makes it easier for on-device training and inference. Evaluation is performed on multiple state-of-the-art DL networks such as LeNet, AlexNet, ResNet using the German Traffic Sign Recognition Benchmark (GTSRB) Dataset, and the Belgium Traffic Sign Dataset (BTSD). With the proposed learning technique, it is observed to achieve maximum accuracy of 92.8% and 89.27% on GTSRB, and BTSD datasets. Also, it is observed the proposed mechanism maintains an average accuracy to be above 85% despite changes in the environment on all the CNNs deployed on the PIM accelerator.","PeriodicalId":211723,"journal":{"name":"2022 25th Euromicro Conference on Digital System Design (DSD)","volume":"101 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 25th Euromicro Conference on Digital System Design (DSD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DSD57027.2022.00125","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Improving the performance of real-time Traffic Sign Recognition (TSR) applications using Deep Learning (DL) algorithms such as Convolutional Neural Networks (CNN) on software platforms is challenging due to the sheer computational complexity of these algorithms. In this work, we adopt a hardware-software combined approach to address this issue. We introduce a data-centric Processing-in-Memory (PIM) architecture that leverages Look-up-Table (LUT)-based processing for minimal data movement and superior performance and efficiency. Despite the superior performance, the limited available memory in PIM makes it complex to deploy deep CNNs. We propose merging CNN layers in this work to meet the limited resource constraints. One specific challenge in the TSR is the continuous change in the deployed environment, which makes a CNN model train over static data, leading to performance degradation over time. To address these challenges, we introduce a lightweight, performance-aware Generative Adversarial Network (GAN)-based on-device learning on PIM architecture. This compact CNN on PIM architecture attains data-level parallelism and reduces pipelining delays and makes it easier for on-device training and inference. Evaluation is performed on multiple state-of-the-art DL networks such as LeNet, AlexNet, ResNet using the German Traffic Sign Recognition Benchmark (GTSRB) Dataset, and the Belgium Traffic Sign Dataset (BTSD). With the proposed learning technique, it is observed to achieve maximum accuracy of 92.8% and 89.27% on GTSRB, and BTSD datasets. Also, it is observed the proposed mechanism maintains an average accuracy to be above 85% despite changes in the environment on all the CNNs deployed on the PIM accelerator.

查看原文本刊更多论文

POLAR:低功耗机器学习应用的性能感知设备学习可编程内存处理架构

由于这些算法的计算复杂性，在软件平台上使用卷积神经网络(CNN)等深度学习(DL)算法来提高实时交通标志识别(TSR)应用程序的性能是具有挑战性的。在这项工作中，我们采用硬件和软件相结合的方法来解决这个问题。我们介绍了以数据为中心的内存中处理(PIM)架构，该架构利用基于查找表(LUT)的处理来实现最小的数据移动和卓越的性能和效率。尽管性能优越，但PIM中有限的可用内存使得部署深度cnn变得复杂。我们建议在这项工作中合并CNN层，以满足有限的资源约束。TSR中的一个具体挑战是部署环境的持续变化，这使得CNN模型在静态数据上训练，导致性能随着时间的推移而下降。为了应对这些挑战，我们在PIM架构上引入了一种轻量级的、性能感知的基于设备学习的生成对抗网络(GAN)。这种基于PIM架构的紧凑CNN实现了数据级并行性，减少了流水线延迟，使设备上的训练和推理更容易。使用德国交通标志识别基准(GTSRB)数据集和比利时交通标志数据集(BTSD)，在LeNet、AlexNet、ResNet等多个最先进的深度学习网络上进行评估。使用本文提出的学习技术，在GTSRB和BTSD数据集上的准确率分别达到了92.8%和89.27%。此外，尽管在PIM加速器上部署的所有cnn的环境发生了变化，但所提出的机制仍保持了85%以上的平均精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 25th Euromicro Conference on Digital System Design (DSD)

自引率

0.00%

发文量