Improved Exploration Strategy for Q-Learning Based Multipath Routing in SDN Networks

IF 3.9 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of Network and Systems Management Pub Date : 2024-02-16 DOI:10.1007/s10922-024-09804-0

{"title":"Improved Exploration Strategy for Q-Learning Based Multipath Routing in SDN Networks","authors":"","doi":"10.1007/s10922-024-09804-0","DOIUrl":null,"url":null,"abstract":"<h3>Abstract</h3> Software-Defined Networking (SDN) is characterized by a high level of programmability and offers a rich set of capabilities for network management operations. Network intelligence is centralized in the controller, which is responsible for updating the routing policies according to the applications’ requirements. To further enhance such capabilities, the controller has to be endowed with intelligence by integrating Artificial Intelligence (AI) tools in order to provide the controller the ability to autonomously reconfigure the network in a timely way. In this paper, we address the deployment of a Q-learning algorithm for the routing optimization problem in terms of latency minimization. Using a direct modeling approach of the multi-path flow-routing problem, we delve deeper into the impact of the exploration-exploitation strategies on the algorithm’s performance. Furthermore, we propose a couple of improvements to the Q-Learning algorithm to enhance its performance within the considered environment. On the one hand, we integrate a congestion-avoidance mechanism in the exploration phase, which leads to effective improvements in the algorithm’s performance with regard to average latency, convergence time, and computation time. On the other hand, we propose to implement a novel strategy based on the Max-Boltzman Exploration method (MBE), which is a combination of the traditional \\(\\varepsilon\\) - greedy and softmax strategies. The results show that, for an appropriate tuning of the hyperparameters, the MBE strategy combined with the congestion-avoidance mechanism performs better than the \\(\\varepsilon\\) -greedy, \\(\\varepsilon\\) -decay, and Softmax strategies in terms of average latency, convergence time, and computation time.","PeriodicalId":50119,"journal":{"name":"Journal of Network and Systems Management","volume":"18 1","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2024-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Network and Systems Management","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10922-024-09804-0","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Software-Defined Networking (SDN) is characterized by a high level of programmability and offers a rich set of capabilities for network management operations. Network intelligence is centralized in the controller, which is responsible for updating the routing policies according to the applications’ requirements. To further enhance such capabilities, the controller has to be endowed with intelligence by integrating Artificial Intelligence (AI) tools in order to provide the controller the ability to autonomously reconfigure the network in a timely way. In this paper, we address the deployment of a Q-learning algorithm for the routing optimization problem in terms of latency minimization. Using a direct modeling approach of the multi-path flow-routing problem, we delve deeper into the impact of the exploration-exploitation strategies on the algorithm’s performance. Furthermore, we propose a couple of improvements to the Q-Learning algorithm to enhance its performance within the considered environment. On the one hand, we integrate a congestion-avoidance mechanism in the exploration phase, which leads to effective improvements in the algorithm’s performance with regard to average latency, convergence time, and computation time. On the other hand, we propose to implement a novel strategy based on the Max-Boltzman Exploration method (MBE), which is a combination of the traditional \(\varepsilon\) - greedy and softmax strategies. The results show that, for an appropriate tuning of the hyperparameters, the MBE strategy combined with the congestion-avoidance mechanism performs better than the \(\varepsilon\) -greedy, \(\varepsilon\) -decay, and Softmax strategies in terms of average latency, convergence time, and computation time.

查看原文本刊更多论文

SDN 网络中基于 Q 学习的多路径路由的改进探索策略

摘要软件定义网络（Software-Defined Networking，SDN）的特点是具有高度的可编程性，可为网络管理操作提供丰富的功能。网络智能集中在控制器中，控制器负责根据应用需求更新路由策略。为了进一步增强这种能力，必须通过集成人工智能（AI）工具赋予控制器智能，以便使控制器具备及时自主重新配置网络的能力。在本文中，我们针对延迟最小化的路由优化问题部署了 Q-learning 算法。利用多路径流量路由问题的直接建模方法，我们深入探讨了探索-开发策略对算法性能的影响。此外，我们还对 Q-Learning 算法提出了一些改进建议，以提高其在所考虑环境中的性能。一方面，我们在探索阶段集成了拥塞规避机制，从而有效改善了算法在平均延迟、收敛时间和计算时间方面的性能。另一方面，我们提出了一种基于 Max-Boltzman 探索法（MBE）的新策略，它是传统的贪婪策略和软最大策略的结合。结果表明，在适当调整超参数的情况下，MBE策略与拥塞规避机制相结合，在平均延迟、收敛时间和计算时间方面都优于贪婪策略、衰减策略和软最大策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Network and Systems Management 工程技术-电信学

CiteScore

7.60

自引率

16.70%

发文量

审稿时长

>12 weeks

期刊介绍： Journal of Network and Systems Management, features peer-reviewed original research, as well as case studies in the fields of network and system management. The journal regularly disseminates significant new information on both the telecommunications and computing aspects of these fields, as well as their evolution and emerging integration. This outstanding quarterly covers architecture, analysis, design, software, standards, and migration issues related to the operation, management, and control of distributed systems and communication networks for voice, data, video, and networked computing.