Lyapunov-Guided Resource Allocation and Task Scheduling for Edge Computing Cognitive Radio Networks via Deep Reinforcement Learning

IF 4.3 2区综合性期刊 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Sensors Journal Pub Date : 2025-02-26 DOI:10.1109/JSEN.2025.3542972

Chi Xu;Peifeng Zhang;Haibin Yu

{"title":"Lyapunov-Guided Resource Allocation and Task Scheduling for Edge Computing Cognitive Radio Networks via Deep Reinforcement Learning","authors":"Chi Xu;Peifeng Zhang;Haibin Yu","doi":"10.1109/JSEN.2025.3542972","DOIUrl":null,"url":null,"abstract":"Employing cognitive radio to facilitate edge computing is a promising solution to address the spectrum scarcity problem during massive task offloading. This article studies an edge computing cognitive radio network (EC-CRN), where multiple cognitive end devices (CEDs) offload tasks to multiple cognitive base stations (CBSs) for parallel edge computing over the licensed spectrum underlying primary users (PUs). To jointly optimize resource allocation and task scheduling, we formulate a long-term average system cost minimization (ASCM) problem subject to the constraints of end-edge task division, the maximum transmission power of CEDs, the peak interference power to PUs, the computing frequency of CBSs and the long-term energy consumption of CEDs. Due to the long-term objective and long-term constraint coupled slot-by-slot, we employ the Lyapunov optimization theory to derive the upper bound of the Lyapunov drift for the virtual energy consumption backlog and transform the original problem into a one-slot Lyapunov drift-plus-penalty minimization problem. Furthermore, we model the transformed problem by the Markov decision process (MDP) and propose the Lyapunov-guided resource allocation and task scheduling (LRATS) algorithm based on the deep reinforcement learning algorithm with proximal policy optimization (PPO), where the policy network is updated by the policy gradient ascent with adaptive trajectory expectation sampling, and the value network is updated by minimizing the mean squared error of temporal difference (TD). By comparing with benchmark algorithms based on greedy, particle swarm optimization (PSO), deep deterministic policy gradient (DDPG), twin delayed deep deterministic (TD3), and soft actor-critic (SAC) and making ablation experiments, we validate that the proposed algorithm can stably converge with a larger reward and effectively reduce the system cost.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 7","pages":"12253-12264"},"PeriodicalIF":4.3000,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10906327","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10906327/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

Abstract

Employing cognitive radio to facilitate edge computing is a promising solution to address the spectrum scarcity problem during massive task offloading. This article studies an edge computing cognitive radio network (EC-CRN), where multiple cognitive end devices (CEDs) offload tasks to multiple cognitive base stations (CBSs) for parallel edge computing over the licensed spectrum underlying primary users (PUs). To jointly optimize resource allocation and task scheduling, we formulate a long-term average system cost minimization (ASCM) problem subject to the constraints of end-edge task division, the maximum transmission power of CEDs, the peak interference power to PUs, the computing frequency of CBSs and the long-term energy consumption of CEDs. Due to the long-term objective and long-term constraint coupled slot-by-slot, we employ the Lyapunov optimization theory to derive the upper bound of the Lyapunov drift for the virtual energy consumption backlog and transform the original problem into a one-slot Lyapunov drift-plus-penalty minimization problem. Furthermore, we model the transformed problem by the Markov decision process (MDP) and propose the Lyapunov-guided resource allocation and task scheduling (LRATS) algorithm based on the deep reinforcement learning algorithm with proximal policy optimization (PPO), where the policy network is updated by the policy gradient ascent with adaptive trajectory expectation sampling, and the value network is updated by minimizing the mean squared error of temporal difference (TD). By comparing with benchmark algorithms based on greedy, particle swarm optimization (PSO), deep deterministic policy gradient (DDPG), twin delayed deep deterministic (TD3), and soft actor-critic (SAC) and making ablation experiments, we validate that the proposed algorithm can stably converge with a larger reward and effectively reduce the system cost.

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Sensors Journal 工程技术-工程：电子与电气

CiteScore

7.70

自引率

14.00%

发文量

2058

审稿时长

5.2 months

期刊介绍： The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following: -Sensor Phenomenology, Modelling, and Evaluation -Sensor Materials, Processing, and Fabrication -Chemical and Gas Sensors -Microfluidics and Biosensors -Optical Sensors -Physical Sensors: Temperature, Mechanical, Magnetic, and others -Acoustic and Ultrasonic Sensors -Sensor Packaging -Sensor Networks -Sensor Applications -Sensor Systems: Signals, Processing, and Interfaces -Actuators and Sensor Power Systems -Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting -Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data) -Sensors in Industrial Practice