Lyapunov-Guided Resource Allocation and Task Scheduling for Edge Computing Cognitive Radio Networks via Deep Reinforcement Learning

IF 4.3 2区 综合性期刊 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Chi Xu;Peifeng Zhang;Haibin Yu
{"title":"Lyapunov-Guided Resource Allocation and Task Scheduling for Edge Computing Cognitive Radio Networks via Deep Reinforcement Learning","authors":"Chi Xu;Peifeng Zhang;Haibin Yu","doi":"10.1109/JSEN.2025.3542972","DOIUrl":null,"url":null,"abstract":"Employing cognitive radio to facilitate edge computing is a promising solution to address the spectrum scarcity problem during massive task offloading. This article studies an edge computing cognitive radio network (EC-CRN), where multiple cognitive end devices (CEDs) offload tasks to multiple cognitive base stations (CBSs) for parallel edge computing over the licensed spectrum underlying primary users (PUs). To jointly optimize resource allocation and task scheduling, we formulate a long-term average system cost minimization (ASCM) problem subject to the constraints of end-edge task division, the maximum transmission power of CEDs, the peak interference power to PUs, the computing frequency of CBSs and the long-term energy consumption of CEDs. Due to the long-term objective and long-term constraint coupled slot-by-slot, we employ the Lyapunov optimization theory to derive the upper bound of the Lyapunov drift for the virtual energy consumption backlog and transform the original problem into a one-slot Lyapunov drift-plus-penalty minimization problem. Furthermore, we model the transformed problem by the Markov decision process (MDP) and propose the Lyapunov-guided resource allocation and task scheduling (LRATS) algorithm based on the deep reinforcement learning algorithm with proximal policy optimization (PPO), where the policy network is updated by the policy gradient ascent with adaptive trajectory expectation sampling, and the value network is updated by minimizing the mean squared error of temporal difference (TD). By comparing with benchmark algorithms based on greedy, particle swarm optimization (PSO), deep deterministic policy gradient (DDPG), twin delayed deep deterministic (TD3), and soft actor-critic (SAC) and making ablation experiments, we validate that the proposed algorithm can stably converge with a larger reward and effectively reduce the system cost.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 7","pages":"12253-12264"},"PeriodicalIF":4.3000,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10906327","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10906327/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Employing cognitive radio to facilitate edge computing is a promising solution to address the spectrum scarcity problem during massive task offloading. This article studies an edge computing cognitive radio network (EC-CRN), where multiple cognitive end devices (CEDs) offload tasks to multiple cognitive base stations (CBSs) for parallel edge computing over the licensed spectrum underlying primary users (PUs). To jointly optimize resource allocation and task scheduling, we formulate a long-term average system cost minimization (ASCM) problem subject to the constraints of end-edge task division, the maximum transmission power of CEDs, the peak interference power to PUs, the computing frequency of CBSs and the long-term energy consumption of CEDs. Due to the long-term objective and long-term constraint coupled slot-by-slot, we employ the Lyapunov optimization theory to derive the upper bound of the Lyapunov drift for the virtual energy consumption backlog and transform the original problem into a one-slot Lyapunov drift-plus-penalty minimization problem. Furthermore, we model the transformed problem by the Markov decision process (MDP) and propose the Lyapunov-guided resource allocation and task scheduling (LRATS) algorithm based on the deep reinforcement learning algorithm with proximal policy optimization (PPO), where the policy network is updated by the policy gradient ascent with adaptive trajectory expectation sampling, and the value network is updated by minimizing the mean squared error of temporal difference (TD). By comparing with benchmark algorithms based on greedy, particle swarm optimization (PSO), deep deterministic policy gradient (DDPG), twin delayed deep deterministic (TD3), and soft actor-critic (SAC) and making ablation experiments, we validate that the proposed algorithm can stably converge with a larger reward and effectively reduce the system cost.
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Sensors Journal
IEEE Sensors Journal 工程技术-工程:电子与电气
CiteScore
7.70
自引率
14.00%
发文量
2058
审稿时长
5.2 months
期刊介绍: The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following: -Sensor Phenomenology, Modelling, and Evaluation -Sensor Materials, Processing, and Fabrication -Chemical and Gas Sensors -Microfluidics and Biosensors -Optical Sensors -Physical Sensors: Temperature, Mechanical, Magnetic, and others -Acoustic and Ultrasonic Sensors -Sensor Packaging -Sensor Networks -Sensor Applications -Sensor Systems: Signals, Processing, and Interfaces -Actuators and Sensor Power Systems -Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting -Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data) -Sensors in Industrial Practice
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信