Distributed Reinforcement Learning for Quality-of-Service Routing in Wireless Device-to-device Networks

Dongyu Liu, Zexu Li, Zeyu Hu, Yong Li
{"title":"Distributed Reinforcement Learning for Quality-of-Service Routing in Wireless Device-to-device Networks","authors":"Dongyu Liu, Zexu Li, Zeyu Hu, Yong Li","doi":"10.1109/ICCChinaW.2018.8674510","DOIUrl":null,"url":null,"abstract":"In this paper, we aim to determine the multi-hop route between a device-to-device (D2D) source-destination pair which meets the quality-of-service (QoS) of services. We model this QoS routing problem in D2D as a Markov decision process (MDP) and proposes a distributed multi-agent reinforcement learning routing algorithm. We consider the QoS requirements in terms of bandwidth, delay, and packet loss rate, and allocate the routing path according to link information averaged over time in dynamic network environments. By decomposing the Q-function into multiple local Q-functions, each agent can compute its own optimal strategy based on local observations, which greatly reduces the costs of learning and searching in large-scale multi-state systems. The simulation results show that the proposed algorithm can significantly reduce the average end-to-end delay, the average packet loss rate and service rejection rate compared with both the minimum hop algorithm and the traditional routing algorithm which only considers static parameters.","PeriodicalId":201746,"journal":{"name":"2018 IEEE/CIC International Conference on Communications in China (ICCC Workshops)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/CIC International Conference on Communications in China (ICCC Workshops)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCChinaW.2018.8674510","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

In this paper, we aim to determine the multi-hop route between a device-to-device (D2D) source-destination pair which meets the quality-of-service (QoS) of services. We model this QoS routing problem in D2D as a Markov decision process (MDP) and proposes a distributed multi-agent reinforcement learning routing algorithm. We consider the QoS requirements in terms of bandwidth, delay, and packet loss rate, and allocate the routing path according to link information averaged over time in dynamic network environments. By decomposing the Q-function into multiple local Q-functions, each agent can compute its own optimal strategy based on local observations, which greatly reduces the costs of learning and searching in large-scale multi-state systems. The simulation results show that the proposed algorithm can significantly reduce the average end-to-end delay, the average packet loss rate and service rejection rate compared with both the minimum hop algorithm and the traditional routing algorithm which only considers static parameters.
无线设备对设备网络中服务质量路由的分布式强化学习
在本文中,我们的目标是确定设备到设备(D2D)源-目的对之间满足服务质量(QoS)的多跳路由。我们将D2D中的QoS路由问题建模为马尔可夫决策过程(MDP),并提出了一种分布式多智能体强化学习路由算法。我们从带宽、延迟和丢包率方面考虑QoS要求,并根据动态网络环境中链路信息随时间的平均值来分配路由路径。通过将q函数分解为多个局部q函数,每个智能体可以根据局部观察计算出自己的最优策略,从而大大降低了大规模多状态系统的学习和搜索成本。仿真结果表明,与最小跳数算法和仅考虑静态参数的传统路由算法相比,该算法能显著降低端到端平均时延、平均丢包率和服务拒绝率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信