边缘分布式 CNN 推断中的模型和系统鲁棒性

IF 2.2 3区工程技术 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

Integration-The Vlsi Journal Pub Date : 2024-10-20 DOI:10.1016/j.vlsi.2024.102299

Xiaotian Guo , Quan Jiang , Andy D. Pimentel , Todor Stefanov

{"title":"边缘分布式 CNN 推断中的模型和系统鲁棒性","authors":"Xiaotian Guo , Quan Jiang , Andy D. Pimentel , Todor Stefanov","doi":"10.1016/j.vlsi.2024.102299","DOIUrl":null,"url":null,"abstract":"<div><div>Prevalent large CNN models pose a significant challenge in terms of computing resources for resource-constrained devices at the Edge. Distributing the computations and coefficients over multiple edge devices collaboratively has been well studied but these works generally do not consider the presence of device failures (e.g., due to temporary connectivity issues, overload, discharged battery of edge devices). Such unpredictable failures can compromise the reliability of edge devices, inhibiting the proper execution of distributed CNN inference. In this paper, we present a novel partitioning method, called RobustDiCE, for robust distribution and inference of CNN models over multiple edge devices. Our method can tolerate intermittent and permanent device failures in a distributed system at the Edge, offering a tunable trade-off between robustness (i.e., retaining model accuracy after failures) and resource utilization. We verify the system’s robustness by validating the overall end-to-end latency under failures. We evaluate RobustDiCE using the ImageNet-1K dataset on several representative CNN models under various device failure scenarios and compare it with several state-of-the-art partitioning methods as well as an optimal robustness approach (i.e., full neuron replication). In addition, we demonstrate RobustDiCE’s advantages in terms of memory usage and energy consumption per device, and system throughput for various system setups with different device counts.</div></div>","PeriodicalId":54973,"journal":{"name":"Integration-The Vlsi Journal","volume":"100 ","pages":"Article 102299"},"PeriodicalIF":2.2000,"publicationDate":"2024-10-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Model and system robustness in distributed CNN inference at the edge\",\"authors\":\"Xiaotian Guo , Quan Jiang , Andy D. Pimentel , Todor Stefanov\",\"doi\":\"10.1016/j.vlsi.2024.102299\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Prevalent large CNN models pose a significant challenge in terms of computing resources for resource-constrained devices at the Edge. Distributing the computations and coefficients over multiple edge devices collaboratively has been well studied but these works generally do not consider the presence of device failures (e.g., due to temporary connectivity issues, overload, discharged battery of edge devices). Such unpredictable failures can compromise the reliability of edge devices, inhibiting the proper execution of distributed CNN inference. In this paper, we present a novel partitioning method, called RobustDiCE, for robust distribution and inference of CNN models over multiple edge devices. Our method can tolerate intermittent and permanent device failures in a distributed system at the Edge, offering a tunable trade-off between robustness (i.e., retaining model accuracy after failures) and resource utilization. We verify the system’s robustness by validating the overall end-to-end latency under failures. We evaluate RobustDiCE using the ImageNet-1K dataset on several representative CNN models under various device failure scenarios and compare it with several state-of-the-art partitioning methods as well as an optimal robustness approach (i.e., full neuron replication). In addition, we demonstrate RobustDiCE’s advantages in terms of memory usage and energy consumption per device, and system throughput for various system setups with different device counts.</div></div>\",\"PeriodicalId\":54973,\"journal\":{\"name\":\"Integration-The Vlsi Journal\",\"volume\":\"100 \",\"pages\":\"Article 102299\"},\"PeriodicalIF\":2.2000,\"publicationDate\":\"2024-10-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Integration-The Vlsi Journal\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167926024001639\",\"RegionNum\":3,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Integration-The Vlsi Journal","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167926024001639","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

摘要

对于资源有限的边缘设备来说，普遍的大型 CNN 模型对计算资源构成了巨大挑战。在多个边缘设备上协同分配计算和系数的问题已经得到了很好的研究，但这些工作通常没有考虑设备故障的存在（例如，由于临时连接问题、过载、边缘设备电池放电）。这种不可预知的故障会损害边缘设备的可靠性，阻碍分布式 CNN 推断的正常执行。在本文中，我们提出了一种名为 RobustDiCE 的新型分区方法，用于在多个边缘设备上对 CNN 模型进行稳健的分布和推理。我们的方法可以容忍边缘分布式系统中的间歇性和永久性设备故障，在鲁棒性（即故障后保持模型准确性）和资源利用率之间提供可调整的权衡。我们通过验证故障情况下的整体端到端延迟来验证系统的鲁棒性。我们使用 ImageNet-1K 数据集评估了 RobustDiCE 在各种设备故障情况下对几个具有代表性的 CNN 模型的处理效果，并将其与几种最先进的分区方法以及一种最佳鲁棒性方法（即全神经元复制）进行了比较。此外，我们还展示了 RobustDiCE 在每个设备的内存使用和能耗方面的优势，以及在不同设备数量的各种系统设置下的系统吞吐量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Model and system robustness in distributed CNN inference at the edge

Prevalent large CNN models pose a significant challenge in terms of computing resources for resource-constrained devices at the Edge. Distributing the computations and coefficients over multiple edge devices collaboratively has been well studied but these works generally do not consider the presence of device failures (e.g., due to temporary connectivity issues, overload, discharged battery of edge devices). Such unpredictable failures can compromise the reliability of edge devices, inhibiting the proper execution of distributed CNN inference. In this paper, we present a novel partitioning method, called RobustDiCE, for robust distribution and inference of CNN models over multiple edge devices. Our method can tolerate intermittent and permanent device failures in a distributed system at the Edge, offering a tunable trade-off between robustness (i.e., retaining model accuracy after failures) and resource utilization. We verify the system’s robustness by validating the overall end-to-end latency under failures. We evaluate RobustDiCE using the ImageNet-1K dataset on several representative CNN models under various device failure scenarios and compare it with several state-of-the-art partitioning methods as well as an optimal robustness approach (i.e., full neuron replication). In addition, we demonstrate RobustDiCE’s advantages in terms of memory usage and energy consumption per device, and system throughput for various system setups with different device counts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Integration-The Vlsi Journal 工程技术-工程：电子与电气

CiteScore

3.80

自引率

5.30%

发文量

107

审稿时长

6 months

期刊介绍： Integration''s aim is to cover every aspect of the VLSI area, with an emphasis on cross-fertilization between various fields of science, and the design, verification, test and applications of integrated circuits and systems, as well as closely related topics in process and device technologies. Individual issues will feature peer-reviewed tutorials and articles as well as reviews of recent publications. The intended coverage of the journal can be assessed by examining the following (non-exclusive) list of topics: Specification methods and languages; Analog/Digital Integrated Circuits and Systems; VLSI architectures; Algorithms, methods and tools for modeling, simulation, synthesis and verification of integrated circuits and systems of any complexity; Embedded systems; High-level synthesis for VLSI systems; Logic synthesis and finite automata; Testing, design-for-test and test generation algorithms; Physical design; Formal verification; Algorithms implemented in VLSI systems; Systems engineering; Heterogeneous systems.