Model and system robustness in distributed CNN inference at the edge

IF 2.2 3区 工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Xiaotian Guo , Quan Jiang , Andy D. Pimentel , Todor Stefanov
{"title":"Model and system robustness in distributed CNN inference at the edge","authors":"Xiaotian Guo ,&nbsp;Quan Jiang ,&nbsp;Andy D. Pimentel ,&nbsp;Todor Stefanov","doi":"10.1016/j.vlsi.2024.102299","DOIUrl":null,"url":null,"abstract":"<div><div>Prevalent large CNN models pose a significant challenge in terms of computing resources for resource-constrained devices at the Edge. Distributing the computations and coefficients over multiple edge devices collaboratively has been well studied but these works generally do not consider the presence of device failures (e.g., due to temporary connectivity issues, overload, discharged battery of edge devices). Such unpredictable failures can compromise the reliability of edge devices, inhibiting the proper execution of distributed CNN inference. In this paper, we present a novel partitioning method, called RobustDiCE, for robust distribution and inference of CNN models over multiple edge devices. Our method can tolerate intermittent and permanent device failures in a distributed system at the Edge, offering a tunable trade-off between robustness (i.e., retaining model accuracy after failures) and resource utilization. We verify the system’s robustness by validating the overall end-to-end latency under failures. We evaluate RobustDiCE using the ImageNet-1K dataset on several representative CNN models under various device failure scenarios and compare it with several state-of-the-art partitioning methods as well as an optimal robustness approach (i.e., full neuron replication). In addition, we demonstrate RobustDiCE’s advantages in terms of memory usage and energy consumption per device, and system throughput for various system setups with different device counts.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":null,"pages":null},"PeriodicalIF":2.2000,"publicationDate":"2024-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Integration-The Vlsi Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167926024001639","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Prevalent large CNN models pose a significant challenge in terms of computing resources for resource-constrained devices at the Edge. Distributing the computations and coefficients over multiple edge devices collaboratively has been well studied but these works generally do not consider the presence of device failures (e.g., due to temporary connectivity issues, overload, discharged battery of edge devices). Such unpredictable failures can compromise the reliability of edge devices, inhibiting the proper execution of distributed CNN inference. In this paper, we present a novel partitioning method, called RobustDiCE, for robust distribution and inference of CNN models over multiple edge devices. Our method can tolerate intermittent and permanent device failures in a distributed system at the Edge, offering a tunable trade-off between robustness (i.e., retaining model accuracy after failures) and resource utilization. We verify the system’s robustness by validating the overall end-to-end latency under failures. We evaluate RobustDiCE using the ImageNet-1K dataset on several representative CNN models under various device failure scenarios and compare it with several state-of-the-art partitioning methods as well as an optimal robustness approach (i.e., full neuron replication). In addition, we demonstrate RobustDiCE’s advantages in terms of memory usage and energy consumption per device, and system throughput for various system setups with different device counts.
边缘分布式 CNN 推断中的模型和系统鲁棒性
对于资源有限的边缘设备来说,普遍的大型 CNN 模型对计算资源构成了巨大挑战。在多个边缘设备上协同分配计算和系数的问题已经得到了很好的研究,但这些工作通常没有考虑设备故障的存在(例如,由于临时连接问题、过载、边缘设备电池放电)。这种不可预知的故障会损害边缘设备的可靠性,阻碍分布式 CNN 推断的正常执行。在本文中,我们提出了一种名为 RobustDiCE 的新型分区方法,用于在多个边缘设备上对 CNN 模型进行稳健的分布和推理。我们的方法可以容忍边缘分布式系统中的间歇性和永久性设备故障,在鲁棒性(即故障后保持模型准确性)和资源利用率之间提供可调整的权衡。我们通过验证故障情况下的整体端到端延迟来验证系统的鲁棒性。我们使用 ImageNet-1K 数据集评估了 RobustDiCE 在各种设备故障情况下对几个具有代表性的 CNN 模型的处理效果,并将其与几种最先进的分区方法以及一种最佳鲁棒性方法(即全神经元复制)进行了比较。此外,我们还展示了 RobustDiCE 在每个设备的内存使用和能耗方面的优势,以及在不同设备数量的各种系统设置下的系统吞吐量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Integration-The Vlsi Journal
Integration-The Vlsi Journal 工程技术-工程:电子与电气
CiteScore
3.80
自引率
5.30%
发文量
107
审稿时长
6 months
期刊介绍: Integration''s aim is to cover every aspect of the VLSI area, with an emphasis on cross-fertilization between various fields of science, and the design, verification, test and applications of integrated circuits and systems, as well as closely related topics in process and device technologies. Individual issues will feature peer-reviewed tutorials and articles as well as reviews of recent publications. The intended coverage of the journal can be assessed by examining the following (non-exclusive) list of topics: Specification methods and languages; Analog/Digital Integrated Circuits and Systems; VLSI architectures; Algorithms, methods and tools for modeling, simulation, synthesis and verification of integrated circuits and systems of any complexity; Embedded systems; High-level synthesis for VLSI systems; Logic synthesis and finite automata; Testing, design-for-test and test generation algorithms; Physical design; Formal verification; Algorithms implemented in VLSI systems; Systems engineering; Heterogeneous systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信