{"title":"基于推理的强化学习及其在动态资源分配中的应用","authors":"Paschalis Tsiaflakis, W. Coomans","doi":"10.23919/eusipco55093.2022.9909777","DOIUrl":null,"url":null,"abstract":"Reinforcement learning (RL) is a powerful machine learning technique to learn optimal actions in a control system setup. An important drawback of RL algorithms is the need for balancing exploitation vs exploration. Exploration corresponds to taking randomized actions with the aim to learn from it and make better decisions in the future. However, these exploratory actions result in poor performance, and current RL algorithms have a slow convergence as one can only learn from a single action outcome per iteration. We propose a novel concept of Inference-based RL that is applicable to a specific class of RL problems, and that allows to eliminate the performance impact caused by traditional exploration strategies, thereby making RL performance more consistent and greatly improving the convergence speed. The specific RL problem class is a problem class in which the observation of the outcome of one action can be used to infer the outcome of other actions, without the need to actually perform them. We apply this novel concept to the use case of dynamic resource allocation, and show that the proposed algorithm outperforms existing RL algorithms, yielding a drastic increase in both convergence speed and performance.","PeriodicalId":231263,"journal":{"name":"2022 30th European Signal Processing Conference (EUSIPCO)","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Inference-based Reinforcement Learning and its Application to Dynamic Resource Allocation\",\"authors\":\"Paschalis Tsiaflakis, W. Coomans\",\"doi\":\"10.23919/eusipco55093.2022.9909777\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement learning (RL) is a powerful machine learning technique to learn optimal actions in a control system setup. An important drawback of RL algorithms is the need for balancing exploitation vs exploration. Exploration corresponds to taking randomized actions with the aim to learn from it and make better decisions in the future. However, these exploratory actions result in poor performance, and current RL algorithms have a slow convergence as one can only learn from a single action outcome per iteration. We propose a novel concept of Inference-based RL that is applicable to a specific class of RL problems, and that allows to eliminate the performance impact caused by traditional exploration strategies, thereby making RL performance more consistent and greatly improving the convergence speed. The specific RL problem class is a problem class in which the observation of the outcome of one action can be used to infer the outcome of other actions, without the need to actually perform them. We apply this novel concept to the use case of dynamic resource allocation, and show that the proposed algorithm outperforms existing RL algorithms, yielding a drastic increase in both convergence speed and performance.\",\"PeriodicalId\":231263,\"journal\":{\"name\":\"2022 30th European Signal Processing Conference (EUSIPCO)\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 30th European Signal Processing Conference (EUSIPCO)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.23919/eusipco55093.2022.9909777\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 30th European Signal Processing Conference (EUSIPCO)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.23919/eusipco55093.2022.9909777","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Inference-based Reinforcement Learning and its Application to Dynamic Resource Allocation
Reinforcement learning (RL) is a powerful machine learning technique to learn optimal actions in a control system setup. An important drawback of RL algorithms is the need for balancing exploitation vs exploration. Exploration corresponds to taking randomized actions with the aim to learn from it and make better decisions in the future. However, these exploratory actions result in poor performance, and current RL algorithms have a slow convergence as one can only learn from a single action outcome per iteration. We propose a novel concept of Inference-based RL that is applicable to a specific class of RL problems, and that allows to eliminate the performance impact caused by traditional exploration strategies, thereby making RL performance more consistent and greatly improving the convergence speed. The specific RL problem class is a problem class in which the observation of the outcome of one action can be used to infer the outcome of other actions, without the need to actually perform them. We apply this novel concept to the use case of dynamic resource allocation, and show that the proposed algorithm outperforms existing RL algorithms, yielding a drastic increase in both convergence speed and performance.