基于深度强化学习的边缘计算认知无线网络lyapunov引导资源分配和任务调度

IF 4.3 2区综合性期刊 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Sensors Journal Pub Date : 2025-02-26 DOI:10.1109/JSEN.2025.3542972

Chi Xu;Peifeng Zhang;Haibin Yu

{"title":"基于深度强化学习的边缘计算认知无线网络lyapunov引导资源分配和任务调度","authors":"Chi Xu;Peifeng Zhang;Haibin Yu","doi":"10.1109/JSEN.2025.3542972","DOIUrl":null,"url":null,"abstract":"Employing cognitive radio to facilitate edge computing is a promising solution to address the spectrum scarcity problem during massive task offloading. This article studies an edge computing cognitive radio network (EC-CRN), where multiple cognitive end devices (CEDs) offload tasks to multiple cognitive base stations (CBSs) for parallel edge computing over the licensed spectrum underlying primary users (PUs). To jointly optimize resource allocation and task scheduling, we formulate a long-term average system cost minimization (ASCM) problem subject to the constraints of end-edge task division, the maximum transmission power of CEDs, the peak interference power to PUs, the computing frequency of CBSs and the long-term energy consumption of CEDs. Due to the long-term objective and long-term constraint coupled slot-by-slot, we employ the Lyapunov optimization theory to derive the upper bound of the Lyapunov drift for the virtual energy consumption backlog and transform the original problem into a one-slot Lyapunov drift-plus-penalty minimization problem. Furthermore, we model the transformed problem by the Markov decision process (MDP) and propose the Lyapunov-guided resource allocation and task scheduling (LRATS) algorithm based on the deep reinforcement learning algorithm with proximal policy optimization (PPO), where the policy network is updated by the policy gradient ascent with adaptive trajectory expectation sampling, and the value network is updated by minimizing the mean squared error of temporal difference (TD). By comparing with benchmark algorithms based on greedy, particle swarm optimization (PSO), deep deterministic policy gradient (DDPG), twin delayed deep deterministic (TD3), and soft actor-critic (SAC) and making ablation experiments, we validate that the proposed algorithm can stably converge with a larger reward and effectively reduce the system cost.","PeriodicalId":447,"journal":{"name":"IEEE Sensors Journal","volume":"25 7","pages":"12253-12264"},"PeriodicalIF":4.3000,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10906327","citationCount":"0","resultStr":"{\"title\":\"Lyapunov-Guided Resource Allocation and Task Scheduling for Edge Computing Cognitive Radio Networks via Deep Reinforcement Learning\",\"authors\":\"Chi Xu;Peifeng Zhang;Haibin Yu\",\"doi\":\"10.1109/JSEN.2025.3542972\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Employing cognitive radio to facilitate edge computing is a promising solution to address the spectrum scarcity problem during massive task offloading. This article studies an edge computing cognitive radio network (EC-CRN), where multiple cognitive end devices (CEDs) offload tasks to multiple cognitive base stations (CBSs) for parallel edge computing over the licensed spectrum underlying primary users (PUs). To jointly optimize resource allocation and task scheduling, we formulate a long-term average system cost minimization (ASCM) problem subject to the constraints of end-edge task division, the maximum transmission power of CEDs, the peak interference power to PUs, the computing frequency of CBSs and the long-term energy consumption of CEDs. Due to the long-term objective and long-term constraint coupled slot-by-slot, we employ the Lyapunov optimization theory to derive the upper bound of the Lyapunov drift for the virtual energy consumption backlog and transform the original problem into a one-slot Lyapunov drift-plus-penalty minimization problem. Furthermore, we model the transformed problem by the Markov decision process (MDP) and propose the Lyapunov-guided resource allocation and task scheduling (LRATS) algorithm based on the deep reinforcement learning algorithm with proximal policy optimization (PPO), where the policy network is updated by the policy gradient ascent with adaptive trajectory expectation sampling, and the value network is updated by minimizing the mean squared error of temporal difference (TD). By comparing with benchmark algorithms based on greedy, particle swarm optimization (PSO), deep deterministic policy gradient (DDPG), twin delayed deep deterministic (TD3), and soft actor-critic (SAC) and making ablation experiments, we validate that the proposed algorithm can stably converge with a larger reward and effectively reduce the system cost.\",\"PeriodicalId\":447,\"journal\":{\"name\":\"IEEE Sensors Journal\",\"volume\":\"25 7\",\"pages\":\"12253-12264\"},\"PeriodicalIF\":4.3000,\"publicationDate\":\"2025-02-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10906327\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Sensors Journal\",\"FirstCategoryId\":\"103\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10906327/\",\"RegionNum\":2,\"RegionCategory\":\"综合性期刊\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Sensors Journal","FirstCategoryId":"103","ListUrlMain":"https://ieeexplore.ieee.org/document/10906327/","RegionNum":2,"RegionCategory":"综合性期刊","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 0

摘要

利用认知无线电促进边缘计算是解决大规模任务卸载过程中频谱稀缺问题的一种很有前途的解决方案。本文研究了一种边缘计算认知无线网络（EC-CRN），其中多个认知终端设备（ced）将任务卸载到多个认知基站（CBSs），以便在主用户（pu）底层的许可频谱上进行并行边缘计算。为了共同优化资源分配和任务调度，在端端任务划分、ced的最大发射功率、对pu的峰值干扰功率、CBSs的计算频率和ced的长期能耗约束下，提出了一个长期平均系统成本最小化（ASCM）问题。由于长期目标和长期约束是逐槽耦合的，我们利用Lyapunov优化理论推导了虚拟能耗积压的Lyapunov漂移的上界，并将原问题转化为一个单槽Lyapunov漂移+惩罚最小化问题。在此基础上，采用马尔可夫决策过程（MDP）对转换后的问题进行建模，提出了基于深度强化学习的近端策略优化（PPO）算法的lyapunov引导资源分配和任务调度（LRATS）算法，其中策略网络通过自适应轨迹期望采样的策略梯度上升来更新，值网络通过最小化时间差的均方差（TD）来更新。通过与基于贪婪算法、粒子群优化算法（PSO）、深度确定性策略梯度算法（DDPG）、双延迟深度确定性算法（TD3）和软行为者评价算法（SAC）的基准算法进行比较，并进行烧烧实验，验证了该算法能够在较大的奖励下稳定收敛，有效降低了系统成本。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Lyapunov-Guided Resource Allocation and Task Scheduling for Edge Computing Cognitive Radio Networks via Deep Reinforcement Learning

Employing cognitive radio to facilitate edge computing is a promising solution to address the spectrum scarcity problem during massive task offloading. This article studies an edge computing cognitive radio network (EC-CRN), where multiple cognitive end devices (CEDs) offload tasks to multiple cognitive base stations (CBSs) for parallel edge computing over the licensed spectrum underlying primary users (PUs). To jointly optimize resource allocation and task scheduling, we formulate a long-term average system cost minimization (ASCM) problem subject to the constraints of end-edge task division, the maximum transmission power of CEDs, the peak interference power to PUs, the computing frequency of CBSs and the long-term energy consumption of CEDs. Due to the long-term objective and long-term constraint coupled slot-by-slot, we employ the Lyapunov optimization theory to derive the upper bound of the Lyapunov drift for the virtual energy consumption backlog and transform the original problem into a one-slot Lyapunov drift-plus-penalty minimization problem. Furthermore, we model the transformed problem by the Markov decision process (MDP) and propose the Lyapunov-guided resource allocation and task scheduling (LRATS) algorithm based on the deep reinforcement learning algorithm with proximal policy optimization (PPO), where the policy network is updated by the policy gradient ascent with adaptive trajectory expectation sampling, and the value network is updated by minimizing the mean squared error of temporal difference (TD). By comparing with benchmark algorithms based on greedy, particle swarm optimization (PSO), deep deterministic policy gradient (DDPG), twin delayed deep deterministic (TD3), and soft actor-critic (SAC) and making ablation experiments, we validate that the proposed algorithm can stably converge with a larger reward and effectively reduce the system cost.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Sensors Journal 工程技术-工程：电子与电气

CiteScore

7.70

自引率

14.00%

发文量

2058

审稿时长

5.2 months

期刊介绍： The fields of interest of the IEEE Sensors Journal are the theory, design , fabrication, manufacturing and applications of devices for sensing and transducing physical, chemical and biological phenomena, with emphasis on the electronics and physics aspect of sensors and integrated sensors-actuators. IEEE Sensors Journal deals with the following: -Sensor Phenomenology, Modelling, and Evaluation -Sensor Materials, Processing, and Fabrication -Chemical and Gas Sensors -Microfluidics and Biosensors -Optical Sensors -Physical Sensors: Temperature, Mechanical, Magnetic, and others -Acoustic and Ultrasonic Sensors -Sensor Packaging -Sensor Networks -Sensor Applications -Sensor Systems: Signals, Processing, and Interfaces -Actuators and Sensor Power Systems -Sensor Signal Processing for high precision and stability (amplification, filtering, linearization, modulation/demodulation) and under harsh conditions (EMC, radiation, humidity, temperature); energy consumption/harvesting -Sensor Data Processing (soft computing with sensor data, e.g., pattern recognition, machine learning, evolutionary computation; sensor data fusion, processing of wave e.g., electromagnetic and acoustic; and non-wave, e.g., chemical, gravity, particle, thermal, radiative and non-radiative sensor data, detection, estimation and classification based on sensor data) -Sensors in Industrial Practice