{"title":"Multi-Agent Reinforcement Learning Investigation Based on Football Games","authors":"Danping Wu","doi":"10.1109/CCAI57533.2023.10201281","DOIUrl":null,"url":null,"abstract":"Games are classic scenarios for reinforcement learning, and the support of a variety of standard tasks and experimental platforms is one of the reasons for the success of reinforcement learning. In the actual environment, interacting with other individuals is often necessary to be considered, such as in games Go, Poker, football and tennis, which require multi-agent participation. This paper implemented some multi-agent investigation based on Google Research Football to show the performance gap between the centralized trained policy, which produces one action when called; and a unified “super agent”, which produces multiple actions simultaneously. Then it compared the sensitivity of the learned policies by perturbing the policy parameters / observations / initial states / cooperative agent. In this case, Proximal Policy Optimization (PPO) method has faster convergence than IMPALA, and using mean provides higher episode mean reward than using the max operator for aggregation. Besides, a multi-player experiment was also tried out, the results show that the less the episode difference is, the greater the expected return it gets.","PeriodicalId":285760,"journal":{"name":"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCAI57533.2023.10201281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Games are classic scenarios for reinforcement learning, and the support of a variety of standard tasks and experimental platforms is one of the reasons for the success of reinforcement learning. In the actual environment, interacting with other individuals is often necessary to be considered, such as in games Go, Poker, football and tennis, which require multi-agent participation. This paper implemented some multi-agent investigation based on Google Research Football to show the performance gap between the centralized trained policy, which produces one action when called; and a unified “super agent”, which produces multiple actions simultaneously. Then it compared the sensitivity of the learned policies by perturbing the policy parameters / observations / initial states / cooperative agent. In this case, Proximal Policy Optimization (PPO) method has faster convergence than IMPALA, and using mean provides higher episode mean reward than using the max operator for aggregation. Besides, a multi-player experiment was also tried out, the results show that the less the episode difference is, the greater the expected return it gets.