{"title":"Optimizing Multi-DNN Parallel Inference Performance in MEC Networks: A Resource-Aware and Dynamic DNN Deployment Scheme","authors":"Tong Zheng;Yuanguo Bi;Guangjie Han;Xingwei Wang;Yuheng Liu;Yufei Liu;Xiangyi Chen","doi":"10.1109/TC.2025.3605749","DOIUrl":null,"url":null,"abstract":"The advent of Multi-access Edge Computing (MEC) has empowered Internet of Things (IoT) devices and edge servers to deploy sophisticated Deep Neural Network (DNN) applications, enabling real-time inference. Many concurrent inference requests and intricate DNN models demand efficient multi-DNN inference in MEC networks. However, the resource-limited IoT device/edge server and expanding model size force models to be dynamically deployed, resulting in significant undesired energy consumption. In addition, parallel multi-DNN inference on the same device complicates the inference process due to the resource competition among models, increasing the inference latency. In this paper, we propose a Resource-aware and Dynamic DNN Deployment (R3D) scheme with the collaboration of end-edge-cloud. To mitigate resource competition and waste during multi-DNN parallel inference, we develop a Resource Adaptive Management (RAM) algorithm based on the Roofline model, which dynamically allocates resources by accounting for the impact of device-specific performance bottlenecks on inference latency. Additionally, we design a Deep Reinforcement Learning (DRL)-based online optimization algorithm that dynamically adjusts DNN deployment strategies to achieve fast and energy-efficient inference across heterogeneous devices. Experiment results demonstrate that R3D is applicable in MEC environments and performs well in terms of inference latency, resource utilization, and energy consumption.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 11","pages":"3938-3952"},"PeriodicalIF":3.8000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11150612/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
The advent of Multi-access Edge Computing (MEC) has empowered Internet of Things (IoT) devices and edge servers to deploy sophisticated Deep Neural Network (DNN) applications, enabling real-time inference. Many concurrent inference requests and intricate DNN models demand efficient multi-DNN inference in MEC networks. However, the resource-limited IoT device/edge server and expanding model size force models to be dynamically deployed, resulting in significant undesired energy consumption. In addition, parallel multi-DNN inference on the same device complicates the inference process due to the resource competition among models, increasing the inference latency. In this paper, we propose a Resource-aware and Dynamic DNN Deployment (R3D) scheme with the collaboration of end-edge-cloud. To mitigate resource competition and waste during multi-DNN parallel inference, we develop a Resource Adaptive Management (RAM) algorithm based on the Roofline model, which dynamically allocates resources by accounting for the impact of device-specific performance bottlenecks on inference latency. Additionally, we design a Deep Reinforcement Learning (DRL)-based online optimization algorithm that dynamically adjusts DNN deployment strategies to achieve fast and energy-efficient inference across heterogeneous devices. Experiment results demonstrate that R3D is applicable in MEC environments and performs well in terms of inference latency, resource utilization, and energy consumption.
期刊介绍:
The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.