Error Resilient In-Memory Computing Architecture for CNN Inference on the Edge

Proceedings of the Great Lakes Symposium on VLSI 2022 Pub Date : 2022-06-06 DOI:10.1145/3526241.3530351

M. Rios, Flavio Ponzina, G. Ansaloni, A. Levisse, David Atienza Alonso

{"title":"Error Resilient In-Memory Computing Architecture for CNN Inference on the Edge","authors":"M. Rios, Flavio Ponzina, G. Ansaloni, A. Levisse, David Atienza Alonso","doi":"10.1145/3526241.3530351","DOIUrl":null,"url":null,"abstract":"The growing popularity of edge computing has fostered the development of diverse solutions to support Artificial Intelligence (AI) in energy-constrained devices. Nonetheless, comparatively few efforts have focused on the resiliency exhibited by AI workloads (such as Convolutional Neural Networks, CNNs) as an avenue towards increasing their run-time efficiency, and even fewer have proposed strategies to increase such resiliency. We herein address this challenge in the context of Bit-line Computing architectures, an embodiment of the in-memory computing paradigm tailored towards CNN applications. We show that little additional hardware is required to add highly effective error detection and mitigation in such platforms. In turn, our proposed scheme can cope with high error rates when performing memory accesses with no impact on CNNs accuracy, allowing for very aggressive voltage scaling. Complementary, we also show that CNN resiliency can be increased by algorithmic optimizations in addition to architectural ones, adopting a combined ensembling and pruning strategy that increases robustness while not inflating workload requirements. Experiments on different quantized CNN models reveal that our combined hardware/software approach enables the supply voltage to be reduced to just 650mV, decreasing the energy per inference up to 51.3%, without affecting the baseline CNN classification accuracy.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Great Lakes Symposium on VLSI 2022","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3526241.3530351","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

The growing popularity of edge computing has fostered the development of diverse solutions to support Artificial Intelligence (AI) in energy-constrained devices. Nonetheless, comparatively few efforts have focused on the resiliency exhibited by AI workloads (such as Convolutional Neural Networks, CNNs) as an avenue towards increasing their run-time efficiency, and even fewer have proposed strategies to increase such resiliency. We herein address this challenge in the context of Bit-line Computing architectures, an embodiment of the in-memory computing paradigm tailored towards CNN applications. We show that little additional hardware is required to add highly effective error detection and mitigation in such platforms. In turn, our proposed scheme can cope with high error rates when performing memory accesses with no impact on CNNs accuracy, allowing for very aggressive voltage scaling. Complementary, we also show that CNN resiliency can be increased by algorithmic optimizations in addition to architectural ones, adopting a combined ensembling and pruning strategy that increases robustness while not inflating workload requirements. Experiments on different quantized CNN models reveal that our combined hardware/software approach enables the supply voltage to be reduced to just 650mV, decreasing the energy per inference up to 51.3%, without affecting the baseline CNN classification accuracy.

查看原文本刊更多论文

基于边缘的CNN推理的内存中容错计算架构

边缘计算的日益普及促进了各种解决方案的发展，以支持能源受限设备中的人工智能(AI)。尽管如此，相对较少的努力集中在人工智能工作负载(如卷积神经网络，cnn)所表现出的弹性上，作为提高其运行时效率的途径，甚至更少的人提出了增加这种弹性的策略。本文在位线计算架构的背景下解决了这一挑战，位线计算架构是为CNN应用量身定制的内存计算范例的体现。我们表明，在这样的平台中添加高效的错误检测和缓解只需要很少的额外硬件。反过来，我们提出的方案可以在不影响cnn精度的情况下处理内存访问时的高错误率，允许非常积极的电压缩放。此外，我们还表明，除了架构优化之外，CNN的弹性还可以通过算法优化来增加，采用组合的集成和修剪策略可以增加鲁棒性，同时不会增加工作量需求。在不同量化CNN模型上的实验表明，我们的硬件/软件组合方法使供电电压降至仅650mV，每次推理的能量降低高达51.3%，而不影响基线CNN分类精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Great Lakes Symposium on VLSI 2022

自引率

0.00%

发文量