Joint DNN Partitioning and Task Offloading Based on Attention Mechanism-Aided Reinforcement Learning

IF 4.7 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Mengyuan Zhang;Juan Fang;Ziyi Teng;Yaqi Liu;Shen Wu
{"title":"Joint DNN Partitioning and Task Offloading Based on Attention Mechanism-Aided Reinforcement Learning","authors":"Mengyuan Zhang;Juan Fang;Ziyi Teng;Yaqi Liu;Shen Wu","doi":"10.1109/TNSM.2025.3561739","DOIUrl":null,"url":null,"abstract":"The rapid advancement of artificial intelligence applications has resulted in the deployment of a growing number of deep neural networks (DNNs) on mobile devices. Given the limited computational capabilities and small battery capacity of these devices, supporting efficient DNN inference presents a significant challenge. This paper considers the joint design of DNN model partitioning and offloading under high-concurrent tasks scenarios. The primary objective is to accelerate DNN task inference and reduce computational delay. Firstly, we propose an innovative adaptive inference framework that partitions DNN models into interdependent sub-tasks through a hierarchical partitioning method. Secondly, we develop a delay prediction model based on a Random Forest (RF) regression algorithm to estimate the computational delay of each sub-task on different devices. Finally, we designed a high-performance DNN partitioning and task offloading method based on an attention mechanism-aided Soft Actor-Critic (AMSAC) algorithm. The bandwidth allocation for each user is determined by the attention mechanism based on the characteristics of the DNN tasks, and the Soft Actor-Critic algorithm is used for adaptive layer-level partitioning and offloading of the DNN model, reducing collaborative inference delay. Extensive experiments demonstrate that our proposed AMSAC algorithm effectively reduces DNN task inference latency cost and improves service quality.","PeriodicalId":13423,"journal":{"name":"IEEE Transactions on Network and Service Management","volume":"22 3","pages":"2914-2927"},"PeriodicalIF":4.7000,"publicationDate":"2025-04-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10969114","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Network and Service Management","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10969114/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

The rapid advancement of artificial intelligence applications has resulted in the deployment of a growing number of deep neural networks (DNNs) on mobile devices. Given the limited computational capabilities and small battery capacity of these devices, supporting efficient DNN inference presents a significant challenge. This paper considers the joint design of DNN model partitioning and offloading under high-concurrent tasks scenarios. The primary objective is to accelerate DNN task inference and reduce computational delay. Firstly, we propose an innovative adaptive inference framework that partitions DNN models into interdependent sub-tasks through a hierarchical partitioning method. Secondly, we develop a delay prediction model based on a Random Forest (RF) regression algorithm to estimate the computational delay of each sub-task on different devices. Finally, we designed a high-performance DNN partitioning and task offloading method based on an attention mechanism-aided Soft Actor-Critic (AMSAC) algorithm. The bandwidth allocation for each user is determined by the attention mechanism based on the characteristics of the DNN tasks, and the Soft Actor-Critic algorithm is used for adaptive layer-level partitioning and offloading of the DNN model, reducing collaborative inference delay. Extensive experiments demonstrate that our proposed AMSAC algorithm effectively reduces DNN task inference latency cost and improves service quality.
基于注意机制辅助强化学习的DNN联合划分和任务卸载
人工智能应用的快速发展导致了越来越多的深度神经网络(dnn)在移动设备上的部署。考虑到这些设备有限的计算能力和小电池容量,支持有效的DNN推理提出了一个重大挑战。本文考虑了高并发任务场景下深度神经网络模型划分与卸载的联合设计。主要目标是加速深度神经网络任务推理和减少计算延迟。首先,我们提出了一种创新的自适应推理框架,该框架通过分层划分方法将DNN模型划分为相互依赖的子任务。其次,我们建立了一个基于随机森林(RF)回归算法的延迟预测模型来估计每个子任务在不同设备上的计算延迟。最后,我们设计了一种基于注意力机制辅助的软行为者批评家(Soft Actor-Critic, AMSAC)算法的高性能深度神经网络划分和任务卸载方法。根据DNN任务的特点,采用注意机制确定每个用户的带宽分配,采用软Actor-Critic算法对DNN模型进行自适应分层划分和卸载,减少协同推理延迟。大量实验表明,我们提出的AMSAC算法有效地降低了DNN任务推理的延迟成本,提高了服务质量。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Network and Service Management
IEEE Transactions on Network and Service Management Computer Science-Computer Networks and Communications
CiteScore
9.30
自引率
15.10%
发文量
325
期刊介绍: IEEE Transactions on Network and Service Management will publish (online only) peerreviewed archival quality papers that advance the state-of-the-art and practical applications of network and service management. Theoretical research contributions (presenting new concepts and techniques) and applied contributions (reporting on experiences and experiments with actual systems) will be encouraged. These transactions will focus on the key technical issues related to: Management Models, Architectures and Frameworks; Service Provisioning, Reliability and Quality Assurance; Management Functions; Enabling Technologies; Information and Communication Models; Policies; Applications and Case Studies; Emerging Technologies and Standards.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信