Joint DNN Partitioning and Task Offloading Based on Attention Mechanism-Aided Reinforcement Learning

IF 5.4 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Network and Service Management Pub Date : 2025-04-17 DOI:10.1109/TNSM.2025.3561739

Mengyuan Zhang;Juan Fang;Ziyi Teng;Yaqi Liu;Shen Wu

{"title":"Joint DNN Partitioning and Task Offloading Based on Attention Mechanism-Aided Reinforcement Learning","authors":"Mengyuan Zhang;Juan Fang;Ziyi Teng;Yaqi Liu;Shen Wu","doi":"10.1109/TNSM.2025.3561739","DOIUrl":null,"url":null,"abstract":"The rapid advancement of artificial intelligence applications has resulted in the deployment of a growing number of deep neural networks (DNNs) on mobile devices. Given the limited computational capabilities and small battery capacity of these devices, supporting efficient DNN inference presents a significant challenge. This paper considers the joint design of DNN model partitioning and offloading under high-concurrent tasks scenarios. The primary objective is to accelerate DNN task inference and reduce computational delay. Firstly, we propose an innovative adaptive inference framework that partitions DNN models into interdependent sub-tasks through a hierarchical partitioning method. Secondly, we develop a delay prediction model based on a Random Forest (RF) regression algorithm to estimate the computational delay of each sub-task on different devices. Finally, we designed a high-performance DNN partitioning and task offloading method based on an attention mechanism-aided Soft Actor-Critic (AMSAC) algorithm. The bandwidth allocation for each user is determined by the attention mechanism based on the characteristics of the DNN tasks, and the Soft Actor-Critic algorithm is used for adaptive layer-level partitioning and offloading of the DNN model, reducing collaborative inference delay. Extensive experiments demonstrate that our proposed AMSAC algorithm effectively reduces DNN task inference latency cost and improves service quality.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"22 3","pages":"2914-2927"},"PeriodicalIF":5.4000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10969114","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Network and Service Management","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10969114/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The rapid advancement of artificial intelligence applications has resulted in the deployment of a growing number of deep neural networks (DNNs) on mobile devices. Given the limited computational capabilities and small battery capacity of these devices, supporting efficient DNN inference presents a significant challenge. This paper considers the joint design of DNN model partitioning and offloading under high-concurrent tasks scenarios. The primary objective is to accelerate DNN task inference and reduce computational delay. Firstly, we propose an innovative adaptive inference framework that partitions DNN models into interdependent sub-tasks through a hierarchical partitioning method. Secondly, we develop a delay prediction model based on a Random Forest (RF) regression algorithm to estimate the computational delay of each sub-task on different devices. Finally, we designed a high-performance DNN partitioning and task offloading method based on an attention mechanism-aided Soft Actor-Critic (AMSAC) algorithm. The bandwidth allocation for each user is determined by the attention mechanism based on the characteristics of the DNN tasks, and the Soft Actor-Critic algorithm is used for adaptive layer-level partitioning and offloading of the DNN model, reducing collaborative inference delay. Extensive experiments demonstrate that our proposed AMSAC algorithm effectively reduces DNN task inference latency cost and improves service quality.

查看原文本刊更多论文

基于注意机制辅助强化学习的DNN联合划分和任务卸载

人工智能应用的快速发展导致了越来越多的深度神经网络（dnn）在移动设备上的部署。考虑到这些设备有限的计算能力和小电池容量，支持有效的DNN推理提出了一个重大挑战。本文考虑了高并发任务场景下深度神经网络模型划分与卸载的联合设计。主要目标是加速深度神经网络任务推理和减少计算延迟。首先，我们提出了一种创新的自适应推理框架，该框架通过分层划分方法将DNN模型划分为相互依赖的子任务。其次，我们建立了一个基于随机森林（RF）回归算法的延迟预测模型来估计每个子任务在不同设备上的计算延迟。最后，我们设计了一种基于注意力机制辅助的软行为者批评家（Soft Actor-Critic， AMSAC）算法的高性能深度神经网络划分和任务卸载方法。根据DNN任务的特点，采用注意机制确定每个用户的带宽分配，采用软Actor-Critic算法对DNN模型进行自适应分层划分和卸载，减少协同推理延迟。大量实验表明，我们提出的AMSAC算法有效地降低了DNN任务推理的延迟成本，提高了服务质量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Network and Service Management Computer Science-Computer Networks and Communications

CiteScore

9.30

自引率

15.10%

发文量

325

期刊介绍： IEEE Transactions on Network and Service Management will publish (online only) peerreviewed archival quality papers that advance the state-of-the-art and practical applications of network and service management. Theoretical research contributions (presenting new concepts and techniques) and applied contributions (reporting on experiences and experiments with actual systems) will be encouraged. These transactions will focus on the key technical issues related to: Management Models, Architectures and Frameworks; Service Provisioning, Reliability and Quality Assurance; Management Functions; Enabling Technologies; Information and Communication Models; Policies; Applications and Case Studies; Emerging Technologies and Standards.