Yifan Chen , Zhuoquan Yu , Yi Jin , Christine Mwase , Xin Hu , Li Da Xu , Zhuo Zou , Lirong Zheng
{"title":"Self-aware collaborative edge inference with embedded devices for IIoT","authors":"Yifan Chen , Zhuoquan Yu , Yi Jin , Christine Mwase , Xin Hu , Li Da Xu , Zhuo Zou , Lirong Zheng","doi":"10.1016/j.future.2024.107535","DOIUrl":null,"url":null,"abstract":"<div><div>Edge inference and other compute-intensive industrial Internet of Things (IIoT) applications suffer from a bad quality of experience due to the limited and heterogeneous computing and communication resources of embedded devices. To tackle these issues, we propose a model partitioning-based self-aware collaborative edge inference framework. Specifically, the device can adaptively adjust the local model inference scheme by sensing the available computing and communication resources of surrounding devices. When the inference latency requirement cannot be met by local computation, the model should be partitioned for collaborative computation on other devices to improve the inference efficiency. Furthermore, for two typical IIoT scenarios, i.e., bursting and stacking tasks, the latency-aware and throughput-aware collaborative inference algorithms are designed, respectively. Via jointly optimizing the partition layer and collaborative device selection, the optimal inference efficiency, characterized by minimum inference latency and maximum inference throughput, can be obtained. Finally, the performance of our proposal is validated through extensive simulations and tests conducted on 10 Raspberry Pi 4Bs using popular models. Specifically, in the case of two collaborative devices, our platform reaches up to 92.59% latency reduction for bursting tasks and 16.19<span><math><mo>×</mo></math></span> throughput growth for stacking tasks. In addition, the divergence between simulations and tests ranges from 1.64% to 9.56% for bursting tasks and from 3.24% to 11.24% for stacking tasks, which indicates that the theoretical performance analyses are solid. For the general case where the data privacy is not considered and the number of collaborative devices is optimally determined, up to 14.76<span><math><mo>×</mo></math></span> throughput speed up and 84.04% latency reduction can be obtained.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"163 ","pages":"Article 107535"},"PeriodicalIF":6.2000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X24004990","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}
引用次数: 0
Abstract
Edge inference and other compute-intensive industrial Internet of Things (IIoT) applications suffer from a bad quality of experience due to the limited and heterogeneous computing and communication resources of embedded devices. To tackle these issues, we propose a model partitioning-based self-aware collaborative edge inference framework. Specifically, the device can adaptively adjust the local model inference scheme by sensing the available computing and communication resources of surrounding devices. When the inference latency requirement cannot be met by local computation, the model should be partitioned for collaborative computation on other devices to improve the inference efficiency. Furthermore, for two typical IIoT scenarios, i.e., bursting and stacking tasks, the latency-aware and throughput-aware collaborative inference algorithms are designed, respectively. Via jointly optimizing the partition layer and collaborative device selection, the optimal inference efficiency, characterized by minimum inference latency and maximum inference throughput, can be obtained. Finally, the performance of our proposal is validated through extensive simulations and tests conducted on 10 Raspberry Pi 4Bs using popular models. Specifically, in the case of two collaborative devices, our platform reaches up to 92.59% latency reduction for bursting tasks and 16.19 throughput growth for stacking tasks. In addition, the divergence between simulations and tests ranges from 1.64% to 9.56% for bursting tasks and from 3.24% to 11.24% for stacking tasks, which indicates that the theoretical performance analyses are solid. For the general case where the data privacy is not considered and the number of collaborative devices is optimally determined, up to 14.76 throughput speed up and 84.04% latency reduction can be obtained.
期刊介绍:
Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications.
Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration.
Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.