Optimizing Multi-DNN Parallel Inference Performance in MEC Networks: A Resource-Aware and Dynamic DNN Deployment Scheme

IF 3.8 2区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Transactions on Computers Pub Date : 2025-09-03 DOI:10.1109/TC.2025.3605749

Tong Zheng;Yuanguo Bi;Guangjie Han;Xingwei Wang;Yuheng Liu;Yufei Liu;Xiangyi Chen

{"title":"Optimizing Multi-DNN Parallel Inference Performance in MEC Networks: A Resource-Aware and Dynamic DNN Deployment Scheme","authors":"Tong Zheng;Yuanguo Bi;Guangjie Han;Xingwei Wang;Yuheng Liu;Yufei Liu;Xiangyi Chen","doi":"10.1109/TC.2025.3605749","DOIUrl":null,"url":null,"abstract":"The advent of Multi-access Edge Computing (MEC) has empowered Internet of Things (IoT) devices and edge servers to deploy sophisticated Deep Neural Network (DNN) applications, enabling real-time inference. Many concurrent inference requests and intricate DNN models demand efficient multi-DNN inference in MEC networks. However, the resource-limited IoT device/edge server and expanding model size force models to be dynamically deployed, resulting in significant undesired energy consumption. In addition, parallel multi-DNN inference on the same device complicates the inference process due to the resource competition among models, increasing the inference latency. In this paper, we propose a Resource-aware and Dynamic DNN Deployment (R3D) scheme with the collaboration of end-edge-cloud. To mitigate resource competition and waste during multi-DNN parallel inference, we develop a Resource Adaptive Management (RAM) algorithm based on the Roofline model, which dynamically allocates resources by accounting for the impact of device-specific performance bottlenecks on inference latency. Additionally, we design a Deep Reinforcement Learning (DRL)-based online optimization algorithm that dynamically adjusts DNN deployment strategies to achieve fast and energy-efficient inference across heterogeneous devices. Experiment results demonstrate that R3D is applicable in MEC environments and performs well in terms of inference latency, resource utilization, and energy consumption.","PeriodicalId":13087,"journal":{"name":"IEEE Transactions on Computers","volume":"74 11","pages":"3938-3952"},"PeriodicalIF":3.8000,"publicationDate":"2025-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computers","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11150612/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 0

Abstract

The advent of Multi-access Edge Computing (MEC) has empowered Internet of Things (IoT) devices and edge servers to deploy sophisticated Deep Neural Network (DNN) applications, enabling real-time inference. Many concurrent inference requests and intricate DNN models demand efficient multi-DNN inference in MEC networks. However, the resource-limited IoT device/edge server and expanding model size force models to be dynamically deployed, resulting in significant undesired energy consumption. In addition, parallel multi-DNN inference on the same device complicates the inference process due to the resource competition among models, increasing the inference latency. In this paper, we propose a Resource-aware and Dynamic DNN Deployment (R3D) scheme with the collaboration of end-edge-cloud. To mitigate resource competition and waste during multi-DNN parallel inference, we develop a Resource Adaptive Management (RAM) algorithm based on the Roofline model, which dynamically allocates resources by accounting for the impact of device-specific performance bottlenecks on inference latency. Additionally, we design a Deep Reinforcement Learning (DRL)-based online optimization algorithm that dynamically adjusts DNN deployment strategies to achieve fast and energy-efficient inference across heterogeneous devices. Experiment results demonstrate that R3D is applicable in MEC environments and performs well in terms of inference latency, resource utilization, and energy consumption.

查看原文本刊更多论文

MEC网络中优化多DNN并行推理性能：一种资源感知和动态DNN部署方案

多访问边缘计算（MEC）的出现使物联网（IoT）设备和边缘服务器能够部署复杂的深度神经网络（DNN）应用程序，从而实现实时推理。在MEC网络中，许多并发推理请求和复杂的深度神经网络模型需要高效的多深度神经网络推理。然而，资源有限的物联网设备/边缘服务器和不断扩大的模型尺寸迫使模型进行动态部署，导致大量不必要的能源消耗。此外，同一设备上的并行多dnn推理由于模型之间的资源竞争而使推理过程复杂化，增加了推理延迟。本文提出了一种基于端缘云的资源感知和动态DNN部署（R3D）方案。为了减轻多dnn并行推理过程中的资源竞争和浪费，我们开发了一种基于rooline模型的资源自适应管理（RAM）算法，该算法通过考虑特定设备性能瓶颈对推理延迟的影响来动态分配资源。此外，我们设计了一种基于深度强化学习（DRL）的在线优化算法，该算法动态调整DNN部署策略，以实现跨异构设备的快速节能推理。实验结果表明，R3D算法适用于MEC环境，在推理延迟、资源利用率和能耗方面表现良好。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Computers 工程技术-工程：电子与电气

CiteScore

6.60

自引率

5.40%

发文量

199

审稿时长

6.0 months

期刊介绍： The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field. It publishes papers on research in areas of current interest to the readers. These areas include, but are not limited to, the following: a) computer organizations and architectures; b) operating systems, software systems, and communication protocols; c) real-time systems and embedded systems; d) digital devices, computer components, and interconnection networks; e) specification, design, prototyping, and testing methods and tools; f) performance, fault tolerance, reliability, security, and testability; g) case studies and experimental and theoretical evaluations; and h) new and important applications and trends.