{"title":"基于互学习的不稳定性多智能体强化学习中的灵活探索策略","authors":"Yuki Miyashita, T. Sugawara","doi":"10.1109/ICMLA55696.2022.00100","DOIUrl":null,"url":null,"abstract":"A fundamental challenge in multi-agent reinforcement learning is an effective exploration of state-action spaces because agents must learn their policies in a non-stationary environment due to changing policies of other learning agents. As the agent’s learning progresses, different undesired situations may appear one after another and agents have to learn again to adapt them. Therefore, agents must learn again with a high probability of exploration to find the appropriate actions for the exposed situation. However, existing algorithms can suffer from inability to learn behavior again on the lack of exploration for these situations because agents usually become exploitation-oriented by using simple exploration strategies, such as ε-greedy strategy. Therefore, we propose two types of simple exploration strategies, where each agent monitors the trend of performance and controls the exploration probability, ε, based on the transition of performance. By introducing a coordinated problem called the PushBlock problem, which includes the above issue, we show that the proposed method could improve the overall performance relative to conventional ε-greedy strategies and analyze their effects on the generated behavior.","PeriodicalId":128160,"journal":{"name":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","volume":"65 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Flexible Exploration Strategies in Multi-Agent Reinforcement Learning for Instability by Mutual Learning\",\"authors\":\"Yuki Miyashita, T. Sugawara\",\"doi\":\"10.1109/ICMLA55696.2022.00100\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A fundamental challenge in multi-agent reinforcement learning is an effective exploration of state-action spaces because agents must learn their policies in a non-stationary environment due to changing policies of other learning agents. As the agent’s learning progresses, different undesired situations may appear one after another and agents have to learn again to adapt them. Therefore, agents must learn again with a high probability of exploration to find the appropriate actions for the exposed situation. However, existing algorithms can suffer from inability to learn behavior again on the lack of exploration for these situations because agents usually become exploitation-oriented by using simple exploration strategies, such as ε-greedy strategy. Therefore, we propose two types of simple exploration strategies, where each agent monitors the trend of performance and controls the exploration probability, ε, based on the transition of performance. By introducing a coordinated problem called the PushBlock problem, which includes the above issue, we show that the proposed method could improve the overall performance relative to conventional ε-greedy strategies and analyze their effects on the generated behavior.\",\"PeriodicalId\":128160,\"journal\":{\"name\":\"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"volume\":\"65 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA55696.2022.00100\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA55696.2022.00100","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Flexible Exploration Strategies in Multi-Agent Reinforcement Learning for Instability by Mutual Learning
A fundamental challenge in multi-agent reinforcement learning is an effective exploration of state-action spaces because agents must learn their policies in a non-stationary environment due to changing policies of other learning agents. As the agent’s learning progresses, different undesired situations may appear one after another and agents have to learn again to adapt them. Therefore, agents must learn again with a high probability of exploration to find the appropriate actions for the exposed situation. However, existing algorithms can suffer from inability to learn behavior again on the lack of exploration for these situations because agents usually become exploitation-oriented by using simple exploration strategies, such as ε-greedy strategy. Therefore, we propose two types of simple exploration strategies, where each agent monitors the trend of performance and controls the exploration probability, ε, based on the transition of performance. By introducing a coordinated problem called the PushBlock problem, which includes the above issue, we show that the proposed method could improve the overall performance relative to conventional ε-greedy strategies and analyze their effects on the generated behavior.