Error Resilient In-Memory Computing Architecture for CNN Inference on the Edge

M. Rios, Flavio Ponzina, G. Ansaloni, A. Levisse, David Atienza Alonso
{"title":"Error Resilient In-Memory Computing Architecture for CNN Inference on the Edge","authors":"M. Rios, Flavio Ponzina, G. Ansaloni, A. Levisse, David Atienza Alonso","doi":"10.1145/3526241.3530351","DOIUrl":null,"url":null,"abstract":"The growing popularity of edge computing has fostered the development of diverse solutions to support Artificial Intelligence (AI) in energy-constrained devices. Nonetheless, comparatively few efforts have focused on the resiliency exhibited by AI workloads (such as Convolutional Neural Networks, CNNs) as an avenue towards increasing their run-time efficiency, and even fewer have proposed strategies to increase such resiliency. We herein address this challenge in the context of Bit-line Computing architectures, an embodiment of the in-memory computing paradigm tailored towards CNN applications. We show that little additional hardware is required to add highly effective error detection and mitigation in such platforms. In turn, our proposed scheme can cope with high error rates when performing memory accesses with no impact on CNNs accuracy, allowing for very aggressive voltage scaling. Complementary, we also show that CNN resiliency can be increased by algorithmic optimizations in addition to architectural ones, adopting a combined ensembling and pruning strategy that increases robustness while not inflating workload requirements. Experiments on different quantized CNN models reveal that our combined hardware/software approach enables the supply voltage to be reduced to just 650mV, decreasing the energy per inference up to 51.3%, without affecting the baseline CNN classification accuracy.","PeriodicalId":188228,"journal":{"name":"Proceedings of the Great Lakes Symposium on VLSI 2022","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Great Lakes Symposium on VLSI 2022","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3526241.3530351","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

The growing popularity of edge computing has fostered the development of diverse solutions to support Artificial Intelligence (AI) in energy-constrained devices. Nonetheless, comparatively few efforts have focused on the resiliency exhibited by AI workloads (such as Convolutional Neural Networks, CNNs) as an avenue towards increasing their run-time efficiency, and even fewer have proposed strategies to increase such resiliency. We herein address this challenge in the context of Bit-line Computing architectures, an embodiment of the in-memory computing paradigm tailored towards CNN applications. We show that little additional hardware is required to add highly effective error detection and mitigation in such platforms. In turn, our proposed scheme can cope with high error rates when performing memory accesses with no impact on CNNs accuracy, allowing for very aggressive voltage scaling. Complementary, we also show that CNN resiliency can be increased by algorithmic optimizations in addition to architectural ones, adopting a combined ensembling and pruning strategy that increases robustness while not inflating workload requirements. Experiments on different quantized CNN models reveal that our combined hardware/software approach enables the supply voltage to be reduced to just 650mV, decreasing the energy per inference up to 51.3%, without affecting the baseline CNN classification accuracy.
基于边缘的CNN推理的内存中容错计算架构
边缘计算的日益普及促进了各种解决方案的发展,以支持能源受限设备中的人工智能(AI)。尽管如此,相对较少的努力集中在人工智能工作负载(如卷积神经网络,cnn)所表现出的弹性上,作为提高其运行时效率的途径,甚至更少的人提出了增加这种弹性的策略。本文在位线计算架构的背景下解决了这一挑战,位线计算架构是为CNN应用量身定制的内存计算范例的体现。我们表明,在这样的平台中添加高效的错误检测和缓解只需要很少的额外硬件。反过来,我们提出的方案可以在不影响cnn精度的情况下处理内存访问时的高错误率,允许非常积极的电压缩放。此外,我们还表明,除了架构优化之外,CNN的弹性还可以通过算法优化来增加,采用组合的集成和修剪策略可以增加鲁棒性,同时不会增加工作量需求。在不同量化CNN模型上的实验表明,我们的硬件/软件组合方法使供电电压降至仅650mV,每次推理的能量降低高达51.3%,而不影响基线CNN分类精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信