Zheyu Chen , Kin K. Leung , Shiqiang Wang , Leandros Tassiulas , Kevin Chan , Patrick J. Baker
{"title":"Multi-policy reinforcement learning for network resource allocation with periodic behaviors","authors":"Zheyu Chen , Kin K. Leung , Shiqiang Wang , Leandros Tassiulas , Kevin Chan , Patrick J. Baker","doi":"10.1016/j.comnet.2025.111645","DOIUrl":null,"url":null,"abstract":"<div><div>Markov Decision Processes (MDPs) serve as the mathematical foundation of Reinforcement learning (RL), where a Markov process with defined states is used to model the system and the actions to be taken affect the state transitions and the corresponding rewards. The RL and deep RL (DRL) can produce the high-performing action policy to maximize the long-term reward. Although RL/DRL have been widely applied to communication and computer systems, a key limitation is that the system under consideration often does not satisfy the required mathematical properties, thus making the MDP inexact and the derived policy flawed. Therefore, we consider the periodic Markov Decision Process (pMDP), where the evolution of the underlying process and model parameters for the pMDP demonstrate some forms of periodic characteristics (e.g., periodic job arrivals and available resources) which violate the Markov property. To obtain the optimal policies for the pMDP, a policy gradient method with a multi-policy solution framework is proposed, and a deep-learning method is developed to improve the effectiveness and stability of the proposed solution. Furthermore, a layer-sharing strategy is proposed to reduce the storage complexity by reducing the number of parameters in the neural networks. The deep-learning method is applied to achieve the near-optimal allocation of resources to arriving computational tasks in a network setting corresponding to the software-defined network (SDN). Evaluation results reveal that the proposed technique is valid and capable of outperforming a baseline method that employs a single policy by 31% on average.</div></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":"272 ","pages":"Article 111645"},"PeriodicalIF":4.6000,"publicationDate":"2025-08-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128625006127","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Markov Decision Processes (MDPs) serve as the mathematical foundation of Reinforcement learning (RL), where a Markov process with defined states is used to model the system and the actions to be taken affect the state transitions and the corresponding rewards. The RL and deep RL (DRL) can produce the high-performing action policy to maximize the long-term reward. Although RL/DRL have been widely applied to communication and computer systems, a key limitation is that the system under consideration often does not satisfy the required mathematical properties, thus making the MDP inexact and the derived policy flawed. Therefore, we consider the periodic Markov Decision Process (pMDP), where the evolution of the underlying process and model parameters for the pMDP demonstrate some forms of periodic characteristics (e.g., periodic job arrivals and available resources) which violate the Markov property. To obtain the optimal policies for the pMDP, a policy gradient method with a multi-policy solution framework is proposed, and a deep-learning method is developed to improve the effectiveness and stability of the proposed solution. Furthermore, a layer-sharing strategy is proposed to reduce the storage complexity by reducing the number of parameters in the neural networks. The deep-learning method is applied to achieve the near-optimal allocation of resources to arriving computational tasks in a network setting corresponding to the software-defined network (SDN). Evaluation results reveal that the proposed technique is valid and capable of outperforming a baseline method that employs a single policy by 31% on average.
期刊介绍:
Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.