Packet Drop Probability-Optimal Cross-layer Scheduling: Dealing with Curse of Sparsity using Prioritized Experience Replay

M. Sharma, P. Tan, E. Kurniawan, Sumei Sun
{"title":"Packet Drop Probability-Optimal Cross-layer Scheduling: Dealing with Curse of Sparsity using Prioritized Experience Replay","authors":"M. Sharma, P. Tan, E. Kurniawan, Sumei Sun","doi":"10.1109/ICCWorkshops50388.2021.9473857","DOIUrl":null,"url":null,"abstract":"In this work, we develop a reinforcement learning (RL) based model-free approach to obtain a policy for joint packet scheduling and rate adaptation, such that the packet drop probability (PDP) is minimized. The developed learning scheme yields an online cross-layer scheduling policy which takes into account the randomness in packet arrivals and wireless channels, as well as the state of packet buffers. Inherent difference in the time-scales of packet arrival process and the wireless channel variations leads to sparsity in the observed reward signal. Since an RL agent learns by using the feedback obtained in terms of rewards for its actions, the sample complexity of RL approach increases exponentially due to resulting sparsity. Therefore, a basic RL based approach, e.g., double deep Q-network (DDQN) based RL, results in a policy with negligible performance gain over the state-of-the-art schemes, such as shortest processing time (SPT) based scheduling. In order to alleviate the sparse reward problem, we leverage prioritized experience replay (PER) and develop a DDQN-based learning scheme with PER. We observe through simulations that the policy learned using DDQN-PER approach results in a 3-5% lower PDP, compared to both the basic DDQN based RL and SPT scheme.","PeriodicalId":127186,"journal":{"name":"2021 IEEE International Conference on Communications Workshops (ICC Workshops)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Communications Workshops (ICC Workshops)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCWorkshops50388.2021.9473857","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In this work, we develop a reinforcement learning (RL) based model-free approach to obtain a policy for joint packet scheduling and rate adaptation, such that the packet drop probability (PDP) is minimized. The developed learning scheme yields an online cross-layer scheduling policy which takes into account the randomness in packet arrivals and wireless channels, as well as the state of packet buffers. Inherent difference in the time-scales of packet arrival process and the wireless channel variations leads to sparsity in the observed reward signal. Since an RL agent learns by using the feedback obtained in terms of rewards for its actions, the sample complexity of RL approach increases exponentially due to resulting sparsity. Therefore, a basic RL based approach, e.g., double deep Q-network (DDQN) based RL, results in a policy with negligible performance gain over the state-of-the-art schemes, such as shortest processing time (SPT) based scheduling. In order to alleviate the sparse reward problem, we leverage prioritized experience replay (PER) and develop a DDQN-based learning scheme with PER. We observe through simulations that the policy learned using DDQN-PER approach results in a 3-5% lower PDP, compared to both the basic DDQN based RL and SPT scheme.
丢包概率-最优跨层调度:使用优先体验重放处理稀疏性诅咒
在这项工作中,我们开发了一种基于强化学习(RL)的无模型方法来获得联合数据包调度和速率自适应的策略,从而使丢包概率(PDP)最小化。所开发的学习方案产生了一种在线跨层调度策略,该策略考虑了数据包到达和无线信道的随机性以及数据包缓冲区的状态。数据包到达过程的固有时间尺度差异和无线信道变化导致观察到的奖励信号稀疏。由于RL代理通过使用从其行为的奖励方面获得的反馈来学习,因此RL方法的样本复杂性由于产生的稀疏性而呈指数增长。因此,基于RL的基本方法,例如,基于双深度q网络(DDQN)的RL,与最先进的方案(如基于最短处理时间(SPT)的调度)相比,产生的策略性能增益可以忽略不计。为了缓解稀疏奖励问题,我们利用优先体验重放(PER)并开发了一个基于ddqn的PER学习方案。我们通过模拟观察到,与基本的基于DDQN的RL和SPT方案相比,使用DDQN- per方法学习的策略的PDP降低了3-5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信