基于Lyapunov优化的深度强化学习的自适应鲁棒网络路由

2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS) Pub Date : 2020-06-01 DOI:10.1109/IWQoS49365.2020.9213056

Zirui Zhuang, Jingyu Wang, Q. Qi, J. Liao, Zhu Han

{"title":"基于Lyapunov优化的深度强化学习的自适应鲁棒网络路由","authors":"Zirui Zhuang, Jingyu Wang, Q. Qi, J. Liao, Zhu Han","doi":"10.1109/IWQoS49365.2020.9213056","DOIUrl":null,"url":null,"abstract":"The most recent development of the Internet of Things brings massive timely-sensitive and yet bursty data flows. The adaptive network control has been explored using deep reinforcement learning, but it is not sufficient for extremely bursty network traffic flows, especially when the network traffic pattern may change over time. We model the routing control in an environment with time-variant link delays as a Lyapunov optimization problem. We identify that there is a tradeoff between optimization performance and modeling accuracy when the propagation delays are included. We propose a novel deep reinforcement learning-based adaptive network routing method to tackle the issues mentioned above. A Lyapunov optimization technique is used to reduce the upper bound of the Lyapunov drift, which leads to improved queuing stability in networked systems. Experiment results show that the proposed method can learn a routing control policy and adapt to the changing environment. The proposed method outperforms the baseline backpressure method in multiple settings, and converges faster than existing methods. Moreover, the deep reinforcement learning module can effectively learn a better estimation of the longterm Lyapunov drift and penalty functions, and thus it provides superior results in terms of the backlog size, end-to-end latency, age of information, and throughput. Extensive experiments also show that the proposed model performs well under various topologies, and thus the proposed model can be used in general cases. Also the user can adjust the preference parameter at ant time without the need to retrain the neural networks.","PeriodicalId":177899,"journal":{"name":"2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS)","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Adaptive and Robust Network Routing Based on Deep Reinforcement Learning with Lyapunov Optimization\",\"authors\":\"Zirui Zhuang, Jingyu Wang, Q. Qi, J. Liao, Zhu Han\",\"doi\":\"10.1109/IWQoS49365.2020.9213056\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The most recent development of the Internet of Things brings massive timely-sensitive and yet bursty data flows. The adaptive network control has been explored using deep reinforcement learning, but it is not sufficient for extremely bursty network traffic flows, especially when the network traffic pattern may change over time. We model the routing control in an environment with time-variant link delays as a Lyapunov optimization problem. We identify that there is a tradeoff between optimization performance and modeling accuracy when the propagation delays are included. We propose a novel deep reinforcement learning-based adaptive network routing method to tackle the issues mentioned above. A Lyapunov optimization technique is used to reduce the upper bound of the Lyapunov drift, which leads to improved queuing stability in networked systems. Experiment results show that the proposed method can learn a routing control policy and adapt to the changing environment. The proposed method outperforms the baseline backpressure method in multiple settings, and converges faster than existing methods. Moreover, the deep reinforcement learning module can effectively learn a better estimation of the longterm Lyapunov drift and penalty functions, and thus it provides superior results in terms of the backlog size, end-to-end latency, age of information, and throughput. Extensive experiments also show that the proposed model performs well under various topologies, and thus the proposed model can be used in general cases. Also the user can adjust the preference parameter at ant time without the need to retrain the neural networks.\",\"PeriodicalId\":177899,\"journal\":{\"name\":\"2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS)\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IWQoS49365.2020.9213056\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IWQoS49365.2020.9213056","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

物联网的最新发展带来了大量对时间敏感但又突发的数据流。自适应网络控制已经使用深度强化学习进行了探索，但对于极端突发的网络流量流，特别是当网络流量模式可能随时间变化时，它是不够的。我们将时变链路延迟环境下的路由控制建模为李雅普诺夫优化问题。我们发现，当包含传播延迟时，优化性能和建模精度之间存在权衡。我们提出了一种新的基于深度强化学习的自适应网络路由方法来解决上述问题。利用李雅普诺夫优化技术减小了李雅普诺夫漂移的上界，从而提高了网络系统的排队稳定性。实验结果表明，该方法能够学习路由控制策略，并能适应不断变化的环境。该方法在多种情况下优于基线背压法，且收敛速度快于现有方法。此外，深度强化学习模块可以有效地学习对长期Lyapunov漂移和惩罚函数的更好估计，从而在积压规模、端到端延迟、信息年龄和吞吐量方面提供了更好的结果。大量的实验也表明，该模型在各种拓扑下都具有良好的性能，因此该模型可以用于一般情况。此外，用户可以随时调整偏好参数，而无需重新训练神经网络。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adaptive and Robust Network Routing Based on Deep Reinforcement Learning with Lyapunov Optimization

The most recent development of the Internet of Things brings massive timely-sensitive and yet bursty data flows. The adaptive network control has been explored using deep reinforcement learning, but it is not sufficient for extremely bursty network traffic flows, especially when the network traffic pattern may change over time. We model the routing control in an environment with time-variant link delays as a Lyapunov optimization problem. We identify that there is a tradeoff between optimization performance and modeling accuracy when the propagation delays are included. We propose a novel deep reinforcement learning-based adaptive network routing method to tackle the issues mentioned above. A Lyapunov optimization technique is used to reduce the upper bound of the Lyapunov drift, which leads to improved queuing stability in networked systems. Experiment results show that the proposed method can learn a routing control policy and adapt to the changing environment. The proposed method outperforms the baseline backpressure method in multiple settings, and converges faster than existing methods. Moreover, the deep reinforcement learning module can effectively learn a better estimation of the longterm Lyapunov drift and penalty functions, and thus it provides superior results in terms of the backlog size, end-to-end latency, age of information, and throughput. Extensive experiments also show that the proposed model performs well under various topologies, and thus the proposed model can be used in general cases. Also the user can adjust the preference parameter at ant time without the need to retrain the neural networks.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE/ACM 28th International Symposium on Quality of Service (IWQoS)

自引率

0.00%

发文量