Joint DNN Model Deployment, Selection, and Configuration for Heterogeneous Inference Services Toward Edge Intelligence

IF 9.2 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Mobile Computing Pub Date : 2025-07-10 DOI:10.1109/TMC.2025.3586793

Hebin Huang;Junbin Liang;Geyong Min

{"title":"Joint DNN Model Deployment, Selection, and Configuration for Heterogeneous Inference Services Toward Edge Intelligence","authors":"Hebin Huang;Junbin Liang;Geyong Min","doi":"10.1109/TMC.2025.3586793","DOIUrl":null,"url":null,"abstract":"Edge intelligence is an emerging paradigm in edge computing that deploys Deep Neural Network (DNN) models on edge servers with limited storage and computation capacities to provide inference services for high mobility and real-time applications, such as autonomous driving or smart surveillance, with varying accuracy and delay requirements. Adapting application configurations (e.g., image resolution or video frame rate) while selecting different DNN models and deployment locations can provide high-accuracy, low-delay inference services that meet user requirements. However, the configurations and DNN models of various inference services are highly heterogeneous. As balancing inference accuracy, resource cost, and delay is a multi-objective programming problem, it is a great challenge to obtain the optimal solution. To address this challenge, we propose a novel online framework to jointly optimize the configuration adaption, DNN model selection, and deployment for heterogeneous inference services. Specifically, we first formulate this joint optimization problem as an integer linear programming problem and prove it is NP-hard. Then, we further model the problem as a Partial Observable Markov Decision Process (POMDP) and solve it by developing a Heterogeneous-Agent Reinforcement Learning (HARL) based algorithm, named Heterogeneous Inference Service ProvidER (HISPER). It allows agents to have different action spaces corresponding to different types of configurations and DNN models. Finally, extensive experiments demonstrate that the proposed algorithm outperforms other state-of-the-art counterparts.","PeriodicalId":50389,"journal":{"name":"IEEE Transactions on Mobile Computing","volume":"24 11","pages":"12726-12741"},"PeriodicalIF":9.2000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Mobile Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11075601/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Edge intelligence is an emerging paradigm in edge computing that deploys Deep Neural Network (DNN) models on edge servers with limited storage and computation capacities to provide inference services for high mobility and real-time applications, such as autonomous driving or smart surveillance, with varying accuracy and delay requirements. Adapting application configurations (e.g., image resolution or video frame rate) while selecting different DNN models and deployment locations can provide high-accuracy, low-delay inference services that meet user requirements. However, the configurations and DNN models of various inference services are highly heterogeneous. As balancing inference accuracy, resource cost, and delay is a multi-objective programming problem, it is a great challenge to obtain the optimal solution. To address this challenge, we propose a novel online framework to jointly optimize the configuration adaption, DNN model selection, and deployment for heterogeneous inference services. Specifically, we first formulate this joint optimization problem as an integer linear programming problem and prove it is NP-hard. Then, we further model the problem as a Partial Observable Markov Decision Process (POMDP) and solve it by developing a Heterogeneous-Agent Reinforcement Learning (HARL) based algorithm, named Heterogeneous Inference Service ProvidER (HISPER). It allows agents to have different action spaces corresponding to different types of configurations and DNN models. Finally, extensive experiments demonstrate that the proposed algorithm outperforms other state-of-the-art counterparts.

查看原文本刊更多论文

面向边缘智能的异构推理服务的联合DNN模型部署、选择和配置

边缘智能是边缘计算中的一种新兴范例，它将深度神经网络（DNN）模型部署在存储和计算能力有限的边缘服务器上，为具有不同精度和延迟要求的高移动性和实时应用（如自动驾驶或智能监控）提供推理服务。在选择不同DNN模型和部署位置的同时调整应用程序配置（例如，图像分辨率或视频帧率），可以提供满足用户需求的高精度、低延迟推理服务。然而，各种推理服务的配置和DNN模型是高度异构的。由于平衡推理精度、资源成本和延迟是一个多目标规划问题，获得最优解是一个很大的挑战。为了解决这一挑战，我们提出了一个新的在线框架来共同优化异构推理服务的配置适应、DNN模型选择和部署。具体来说，我们首先将这个联合优化问题表述为一个整数线性规划问题，并证明了它是np困难的。然后，我们进一步将该问题建模为部分可观察马尔可夫决策过程（POMDP），并通过开发基于异构智能体强化学习（HARL）的算法来解决该问题，该算法称为异构推理服务提供者（HISPER）。它允许代理有不同的动作空间，对应于不同类型的配置和DNN模型。最后，大量的实验表明，所提出的算法优于其他最先进的同行。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Mobile Computing 工程技术-电信学

CiteScore

12.90

自引率

2.50%

发文量

403

审稿时长

6.6 months

期刊介绍： IEEE Transactions on Mobile Computing addresses key technical issues related to various aspects of mobile computing. This includes (a) architectures, (b) support services, (c) algorithm/protocol design and analysis, (d) mobile environments, (e) mobile communication systems, (f) applications, and (g) emerging technologies. Topics of interest span a wide range, covering aspects like mobile networks and hosts, mobility management, multimedia, operating system support, power management, online and mobile environments, security, scalability, reliability, and emerging technologies such as wearable computers, body area networks, and wireless sensor networks. The journal serves as a comprehensive platform for advancements in mobile computing research.