{"title":"Joint DNN Partitioning and Task Offloading Based on Attention Mechanism-Aided Reinforcement Learning","authors":"Mengyuan Zhang;Juan Fang;Ziyi Teng;Yaqi Liu;Shen Wu","doi":"10.1109/TNSM.2025.3561739","DOIUrl":null,"url":null,"abstract":"The rapid advancement of artificial intelligence applications has resulted in the deployment of a growing number of deep neural networks (DNNs) on mobile devices. Given the limited computational capabilities and small battery capacity of these devices, supporting efficient DNN inference presents a significant challenge. This paper considers the joint design of DNN model partitioning and offloading under high-concurrent tasks scenarios. The primary objective is to accelerate DNN task inference and reduce computational delay. Firstly, we propose an innovative adaptive inference framework that partitions DNN models into interdependent sub-tasks through a hierarchical partitioning method. Secondly, we develop a delay prediction model based on a Random Forest (RF) regression algorithm to estimate the computational delay of each sub-task on different devices. Finally, we designed a high-performance DNN partitioning and task offloading method based on an attention mechanism-aided Soft Actor-Critic (AMSAC) algorithm. The bandwidth allocation for each user is determined by the attention mechanism based on the characteristics of the DNN tasks, and the Soft Actor-Critic algorithm is used for adaptive layer-level partitioning and offloading of the DNN model, reducing collaborative inference delay. Extensive experiments demonstrate that our proposed AMSAC algorithm effectively reduces DNN task inference latency cost and improves service quality.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"22 3","pages":"2914-2927"},"PeriodicalIF":4.7000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10969114","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Network and Service Management","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10969114/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The rapid advancement of artificial intelligence applications has resulted in the deployment of a growing number of deep neural networks (DNNs) on mobile devices. Given the limited computational capabilities and small battery capacity of these devices, supporting efficient DNN inference presents a significant challenge. This paper considers the joint design of DNN model partitioning and offloading under high-concurrent tasks scenarios. The primary objective is to accelerate DNN task inference and reduce computational delay. Firstly, we propose an innovative adaptive inference framework that partitions DNN models into interdependent sub-tasks through a hierarchical partitioning method. Secondly, we develop a delay prediction model based on a Random Forest (RF) regression algorithm to estimate the computational delay of each sub-task on different devices. Finally, we designed a high-performance DNN partitioning and task offloading method based on an attention mechanism-aided Soft Actor-Critic (AMSAC) algorithm. The bandwidth allocation for each user is determined by the attention mechanism based on the characteristics of the DNN tasks, and the Soft Actor-Critic algorithm is used for adaptive layer-level partitioning and offloading of the DNN model, reducing collaborative inference delay. Extensive experiments demonstrate that our proposed AMSAC algorithm effectively reduces DNN task inference latency cost and improves service quality.
期刊介绍:
IEEE Transactions on Network and Service Management will publish (online only) peerreviewed archival quality papers that advance the state-of-the-art and practical applications of network and service management. Theoretical research contributions (presenting new concepts and techniques) and applied contributions (reporting on experiences and experiments with actual systems) will be encouraged. These transactions will focus on the key technical issues related to: Management Models, Architectures and Frameworks; Service Provisioning, Reliability and Quality Assurance; Management Functions; Enabling Technologies; Information and Communication Models; Policies; Applications and Case Studies; Emerging Technologies and Standards.