Timely-throughput Optimal Scheduling for Wireless Flows with Deep Reinforcement Learning

2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS) Pub Date : 2022-06-10 DOI:10.1109/IWQoS54832.2022.9812916

Qi Wang, Chentao He, K. Jaffrès-Runser, Jianhui Huang, Yongjun Xu

{"title":"Timely-throughput Optimal Scheduling for Wireless Flows with Deep Reinforcement Learning","authors":"Qi Wang, Chentao He, K. Jaffrès-Runser, Jianhui Huang, Yongjun Xu","doi":"10.1109/IWQoS54832.2022.9812916","DOIUrl":null,"url":null,"abstract":"This paper addresses the problem of scheduling real-time wireless flows under dynamic network conditions and general traffic patterns. The objective is to maximize the fraction of packets of each flow to be delivered within their deadlines, referred to as timely-throughput. The scheduling problem under restrictive frame-based traffic models or greedy maximal scheduling schemes like LDF has been extensively studied so far, but scheduling algorithms to provide deadline guarantees on packet delivery for general traffic under dynamic network conditions are very limited. We propose two scheduling algorithms using deep reinforcement learning approach to optimize timely-throughput for general traffic in dynamic wireless networks: RL-Centralized scheduling algorithm and RL-Decentralized scheduling algo-rithm. Specifically, we formulate the centralized scheduling problem as a Markov Decision Process (MDP) and a multi-environments double deep Q-network (ME-DDQN) structure is proposed to adapt to the dynamic network conditions. The decentralized scheduling problem is formulated as a Partially Observable Markov Decision Process (POMDP) and an expert-apprentice centralized training and decentralized execution (EA-CTDE) structure is designed to accelerate the training speed and achieve the optimal timely-throughput. The extensive results show that the proposed scheduling algorithms converge fast and adapt well to network dynamics with superior performance compared to baseline policies. Finally, experimental tests confirm simulation results and also show that the proposed algorithms are feasible in practice on resource limited platforms.","PeriodicalId":353365,"journal":{"name":"2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWQoS54832.2022.9812916","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

This paper addresses the problem of scheduling real-time wireless flows under dynamic network conditions and general traffic patterns. The objective is to maximize the fraction of packets of each flow to be delivered within their deadlines, referred to as timely-throughput. The scheduling problem under restrictive frame-based traffic models or greedy maximal scheduling schemes like LDF has been extensively studied so far, but scheduling algorithms to provide deadline guarantees on packet delivery for general traffic under dynamic network conditions are very limited. We propose two scheduling algorithms using deep reinforcement learning approach to optimize timely-throughput for general traffic in dynamic wireless networks: RL-Centralized scheduling algorithm and RL-Decentralized scheduling algo-rithm. Specifically, we formulate the centralized scheduling problem as a Markov Decision Process (MDP) and a multi-environments double deep Q-network (ME-DDQN) structure is proposed to adapt to the dynamic network conditions. The decentralized scheduling problem is formulated as a Partially Observable Markov Decision Process (POMDP) and an expert-apprentice centralized training and decentralized execution (EA-CTDE) structure is designed to accelerate the training speed and achieve the optimal timely-throughput. The extensive results show that the proposed scheduling algorithms converge fast and adapt well to network dynamics with superior performance compared to baseline policies. Finally, experimental tests confirm simulation results and also show that the proposed algorithms are feasible in practice on resource limited platforms.

查看原文本刊更多论文

基于深度强化学习的无线流实时吞吐量最优调度

本文研究了动态网络条件和一般流量模式下的实时无线流调度问题。目标是最大化在截止日期内交付的每个流的数据包的比例，称为及时吞吐量。目前，基于限制帧的流量模型或贪婪最大调度方案(如LDF)下的调度问题已经得到了广泛的研究，但在动态网络条件下为一般流量提供数据包交付时限保证的调度算法非常有限。我们提出了两种调度算法，采用深度强化学习方法优化动态无线网络中一般流量的时间吞吐量:rl集中式调度算法和rl分散式调度算法。具体而言，我们将集中调度问题表述为马尔可夫决策过程(MDP)，并提出了适应动态网络条件的多环境双深度q -网络(ME-DDQN)结构。将分散调度问题表述为部分可观察马尔可夫决策过程(POMDP)，并设计了专家-学徒集中训练和分散执行(EA-CTDE)结构，以加快训练速度并实现最优的时间吞吐量。实验结果表明，该调度算法收敛速度快，对网络动态适应性好，性能优于基准策略。最后，通过实验验证了仿真结果，也证明了算法在资源有限的平台上是可行的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS)

自引率

0.00%

发文量