{"title":"Ensemble Link Learning for Large State Space Multiple Access Communications","authors":"Talha Bozkus, U. Mitra","doi":"10.23919/eusipco55093.2022.9909958","DOIUrl":null,"url":null,"abstract":"Wireless communication networks are well-modeled by Markov Decision Processes (MDPs), but induce a large state space which challenges policy optimization. Reinforcement learning such as Q-learning enables the solution of policy opti-mization problems in unknown environments. Herein a graph-learning algorithm is proposed to improve the accuracy and complexity performance of Q-learning algorithm for a multiple access communications problem. By exploiting the structural properties of the wireless network MDP, several structurally related Markov chains are created and these multiple chains are sampled to learn multiple policies which are fused. Furthermore, a state-action aggregation method is proposed to reduce the time and memory complexity of the algorithm. Numerical results show that the proposed algorithm achieves a reduction of 80% with respect to the policy error and a reduction of 70% for the runtime versus other state-of-the-art $Q$ learning algorithms.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"30 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/eusipco55093.2022.9909958","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Wireless communication networks are well-modeled by Markov Decision Processes (MDPs), but induce a large state space which challenges policy optimization. Reinforcement learning such as Q-learning enables the solution of policy opti-mization problems in unknown environments. Herein a graph-learning algorithm is proposed to improve the accuracy and complexity performance of Q-learning algorithm for a multiple access communications problem. By exploiting the structural properties of the wireless network MDP, several structurally related Markov chains are created and these multiple chains are sampled to learn multiple policies which are fused. Furthermore, a state-action aggregation method is proposed to reduce the time and memory complexity of the algorithm. Numerical results show that the proposed algorithm achieves a reduction of 80% with respect to the policy error and a reduction of 70% for the runtime versus other state-of-the-art $Q$ learning algorithms.