Timely-throughput Optimal Scheduling for Wireless Flows with Deep Reinforcement Learning

Qi Wang, Chentao He, K. Jaffrès-Runser, Jianhui Huang, Yongjun Xu
{"title":"Timely-throughput Optimal Scheduling for Wireless Flows with Deep Reinforcement Learning","authors":"Qi Wang, Chentao He, K. Jaffrès-Runser, Jianhui Huang, Yongjun Xu","doi":"10.1109/IWQoS54832.2022.9812916","DOIUrl":null,"url":null,"abstract":"This paper addresses the problem of scheduling real-time wireless flows under dynamic network conditions and general traffic patterns. The objective is to maximize the fraction of packets of each flow to be delivered within their deadlines, referred to as timely-throughput. The scheduling problem under restrictive frame-based traffic models or greedy maximal scheduling schemes like LDF has been extensively studied so far, but scheduling algorithms to provide deadline guarantees on packet delivery for general traffic under dynamic network conditions are very limited. We propose two scheduling algorithms using deep reinforcement learning approach to optimize timely-throughput for general traffic in dynamic wireless networks: RL-Centralized scheduling algorithm and RL-Decentralized scheduling algo-rithm. Specifically, we formulate the centralized scheduling problem as a Markov Decision Process (MDP) and a multi-environments double deep Q-network (ME-DDQN) structure is proposed to adapt to the dynamic network conditions. The decentralized scheduling problem is formulated as a Partially Observable Markov Decision Process (POMDP) and an expert-apprentice centralized training and decentralized execution (EA-CTDE) structure is designed to accelerate the training speed and achieve the optimal timely-throughput. The extensive results show that the proposed scheduling algorithms converge fast and adapt well to network dynamics with superior performance compared to baseline policies. Finally, experimental tests confirm simulation results and also show that the proposed algorithms are feasible in practice on resource limited platforms.","PeriodicalId":353365,"journal":{"name":"2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM 30th International Symposium on Quality of Service (IWQoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWQoS54832.2022.9812916","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

This paper addresses the problem of scheduling real-time wireless flows under dynamic network conditions and general traffic patterns. The objective is to maximize the fraction of packets of each flow to be delivered within their deadlines, referred to as timely-throughput. The scheduling problem under restrictive frame-based traffic models or greedy maximal scheduling schemes like LDF has been extensively studied so far, but scheduling algorithms to provide deadline guarantees on packet delivery for general traffic under dynamic network conditions are very limited. We propose two scheduling algorithms using deep reinforcement learning approach to optimize timely-throughput for general traffic in dynamic wireless networks: RL-Centralized scheduling algorithm and RL-Decentralized scheduling algo-rithm. Specifically, we formulate the centralized scheduling problem as a Markov Decision Process (MDP) and a multi-environments double deep Q-network (ME-DDQN) structure is proposed to adapt to the dynamic network conditions. The decentralized scheduling problem is formulated as a Partially Observable Markov Decision Process (POMDP) and an expert-apprentice centralized training and decentralized execution (EA-CTDE) structure is designed to accelerate the training speed and achieve the optimal timely-throughput. The extensive results show that the proposed scheduling algorithms converge fast and adapt well to network dynamics with superior performance compared to baseline policies. Finally, experimental tests confirm simulation results and also show that the proposed algorithms are feasible in practice on resource limited platforms.
基于深度强化学习的无线流实时吞吐量最优调度
本文研究了动态网络条件和一般流量模式下的实时无线流调度问题。目标是最大化在截止日期内交付的每个流的数据包的比例,称为及时吞吐量。目前,基于限制帧的流量模型或贪婪最大调度方案(如LDF)下的调度问题已经得到了广泛的研究,但在动态网络条件下为一般流量提供数据包交付时限保证的调度算法非常有限。我们提出了两种调度算法,采用深度强化学习方法优化动态无线网络中一般流量的时间吞吐量:rl集中式调度算法和rl分散式调度算法。具体而言,我们将集中调度问题表述为马尔可夫决策过程(MDP),并提出了适应动态网络条件的多环境双深度q -网络(ME-DDQN)结构。将分散调度问题表述为部分可观察马尔可夫决策过程(POMDP),并设计了专家-学徒集中训练和分散执行(EA-CTDE)结构,以加快训练速度并实现最优的时间吞吐量。实验结果表明,该调度算法收敛速度快,对网络动态适应性好,性能优于基准策略。最后,通过实验验证了仿真结果,也证明了算法在资源有限的平台上是可行的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信