Distributed Reinforcement Learning for Quality-of-Service Routing in Wireless Device-to-device Networks

2018 IEEE/CIC International Conference on Communications in China (ICCC Workshops) Pub Date : 2018-08-01 DOI:10.1109/ICCChinaW.2018.8674510

Dongyu Liu, Zexu Li, Zeyu Hu, Yong Li

{"title":"Distributed Reinforcement Learning for Quality-of-Service Routing in Wireless Device-to-device Networks","authors":"Dongyu Liu, Zexu Li, Zeyu Hu, Yong Li","doi":"10.1109/ICCChinaW.2018.8674510","DOIUrl":null,"url":null,"abstract":"In this paper, we aim to determine the multi-hop route between a device-to-device (D2D) source-destination pair which meets the quality-of-service (QoS) of services. We model this QoS routing problem in D2D as a Markov decision process (MDP) and proposes a distributed multi-agent reinforcement learning routing algorithm. We consider the QoS requirements in terms of bandwidth, delay, and packet loss rate, and allocate the routing path according to link information averaged over time in dynamic network environments. By decomposing the Q-function into multiple local Q-functions, each agent can compute its own optimal strategy based on local observations, which greatly reduces the costs of learning and searching in large-scale multi-state systems. The simulation results show that the proposed algorithm can significantly reduce the average end-to-end delay, the average packet loss rate and service rejection rate compared with both the minimum hop algorithm and the traditional routing algorithm which only considers static parameters.","PeriodicalId":201746,"journal":{"name":"2018 IEEE/CIC International Conference on Communications in China (ICCC Workshops)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE/CIC International Conference on Communications in China (ICCC Workshops)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCChinaW.2018.8674510","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

In this paper, we aim to determine the multi-hop route between a device-to-device (D2D) source-destination pair which meets the quality-of-service (QoS) of services. We model this QoS routing problem in D2D as a Markov decision process (MDP) and proposes a distributed multi-agent reinforcement learning routing algorithm. We consider the QoS requirements in terms of bandwidth, delay, and packet loss rate, and allocate the routing path according to link information averaged over time in dynamic network environments. By decomposing the Q-function into multiple local Q-functions, each agent can compute its own optimal strategy based on local observations, which greatly reduces the costs of learning and searching in large-scale multi-state systems. The simulation results show that the proposed algorithm can significantly reduce the average end-to-end delay, the average packet loss rate and service rejection rate compared with both the minimum hop algorithm and the traditional routing algorithm which only considers static parameters.

查看原文本刊更多论文

无线设备对设备网络中服务质量路由的分布式强化学习

在本文中，我们的目标是确定设备到设备(D2D)源-目的对之间满足服务质量(QoS)的多跳路由。我们将D2D中的QoS路由问题建模为马尔可夫决策过程(MDP)，并提出了一种分布式多智能体强化学习路由算法。我们从带宽、延迟和丢包率方面考虑QoS要求，并根据动态网络环境中链路信息随时间的平均值来分配路由路径。通过将q函数分解为多个局部q函数，每个智能体可以根据局部观察计算出自己的最优策略，从而大大降低了大规模多状态系统的学习和搜索成本。仿真结果表明，与最小跳数算法和仅考虑静态参数的传统路由算法相比，该算法能显著降低端到端平均时延、平均丢包率和服务拒绝率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE/CIC International Conference on Communications in China (ICCC Workshops)

自引率

0.00%

发文量