{"title":"蒙特卡洛树搜索比较奖励函数的强化学习","authors":"Bálint Kövári, Bálint Pelenczei, Tamás Bécsi","doi":"10.1109/SACI55618.2022.9919518","DOIUrl":null,"url":null,"abstract":"Reinforcement Learning has gained tremendous attention recently, thanks to its excellent solutions in several challenging domains. However, the formulation of the reward signal is always difficult and crucially important since it is the only guidance that the agent has for solving the given control task. Finding the proper reward is time-consuming since the model must be trained with all the potential candidates. Finally, a comparison has to be conducted. This paper proposes that the Monte-Carlo Tree Search algorithm can be used to compare and rank the different reward strategies. To see that the search algorithm can be used for such a task. A Policy Gradient algorithm is trained to solve the Traffic Signal Control problem with different rewarding strategies from the literature. The results show that both methods suggest the same order between the performances of the rewarding concepts. Hence the Monte-Carlo Tree Search algorithm can find the best reward for training, which seriously decreases the resource intensity of the entire process.","PeriodicalId":105691,"journal":{"name":"2022 IEEE 16th International Symposium on Applied Computational Intelligence and Informatics (SACI)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Monte Carlo Tree Search to Compare Reward Functions for Reinforcement Learning\",\"authors\":\"Bálint Kövári, Bálint Pelenczei, Tamás Bécsi\",\"doi\":\"10.1109/SACI55618.2022.9919518\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Reinforcement Learning has gained tremendous attention recently, thanks to its excellent solutions in several challenging domains. However, the formulation of the reward signal is always difficult and crucially important since it is the only guidance that the agent has for solving the given control task. Finding the proper reward is time-consuming since the model must be trained with all the potential candidates. Finally, a comparison has to be conducted. This paper proposes that the Monte-Carlo Tree Search algorithm can be used to compare and rank the different reward strategies. To see that the search algorithm can be used for such a task. A Policy Gradient algorithm is trained to solve the Traffic Signal Control problem with different rewarding strategies from the literature. The results show that both methods suggest the same order between the performances of the rewarding concepts. Hence the Monte-Carlo Tree Search algorithm can find the best reward for training, which seriously decreases the resource intensity of the entire process.\",\"PeriodicalId\":105691,\"journal\":{\"name\":\"2022 IEEE 16th International Symposium on Applied Computational Intelligence and Informatics (SACI)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE 16th International Symposium on Applied Computational Intelligence and Informatics (SACI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SACI55618.2022.9919518\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE 16th International Symposium on Applied Computational Intelligence and Informatics (SACI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SACI55618.2022.9919518","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Monte Carlo Tree Search to Compare Reward Functions for Reinforcement Learning
Reinforcement Learning has gained tremendous attention recently, thanks to its excellent solutions in several challenging domains. However, the formulation of the reward signal is always difficult and crucially important since it is the only guidance that the agent has for solving the given control task. Finding the proper reward is time-consuming since the model must be trained with all the potential candidates. Finally, a comparison has to be conducted. This paper proposes that the Monte-Carlo Tree Search algorithm can be used to compare and rank the different reward strategies. To see that the search algorithm can be used for such a task. A Policy Gradient algorithm is trained to solve the Traffic Signal Control problem with different rewarding strategies from the literature. The results show that both methods suggest the same order between the performances of the rewarding concepts. Hence the Monte-Carlo Tree Search algorithm can find the best reward for training, which seriously decreases the resource intensity of the entire process.