基于足球比赛的多智能体强化学习研究

Danping Wu
{"title":"基于足球比赛的多智能体强化学习研究","authors":"Danping Wu","doi":"10.1109/CCAI57533.2023.10201281","DOIUrl":null,"url":null,"abstract":"Games are classic scenarios for reinforcement learning, and the support of a variety of standard tasks and experimental platforms is one of the reasons for the success of reinforcement learning. In the actual environment, interacting with other individuals is often necessary to be considered, such as in games Go, Poker, football and tennis, which require multi-agent participation. This paper implemented some multi-agent investigation based on Google Research Football to show the performance gap between the centralized trained policy, which produces one action when called; and a unified “super agent”, which produces multiple actions simultaneously. Then it compared the sensitivity of the learned policies by perturbing the policy parameters / observations / initial states / cooperative agent. In this case, Proximal Policy Optimization (PPO) method has faster convergence than IMPALA, and using mean provides higher episode mean reward than using the max operator for aggregation. Besides, a multi-player experiment was also tried out, the results show that the less the episode difference is, the greater the expected return it gets.","PeriodicalId":285760,"journal":{"name":"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Multi-Agent Reinforcement Learning Investigation Based on Football Games\",\"authors\":\"Danping Wu\",\"doi\":\"10.1109/CCAI57533.2023.10201281\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Games are classic scenarios for reinforcement learning, and the support of a variety of standard tasks and experimental platforms is one of the reasons for the success of reinforcement learning. In the actual environment, interacting with other individuals is often necessary to be considered, such as in games Go, Poker, football and tennis, which require multi-agent participation. This paper implemented some multi-agent investigation based on Google Research Football to show the performance gap between the centralized trained policy, which produces one action when called; and a unified “super agent”, which produces multiple actions simultaneously. Then it compared the sensitivity of the learned policies by perturbing the policy parameters / observations / initial states / cooperative agent. In this case, Proximal Policy Optimization (PPO) method has faster convergence than IMPALA, and using mean provides higher episode mean reward than using the max operator for aggregation. Besides, a multi-player experiment was also tried out, the results show that the less the episode difference is, the greater the expected return it gets.\",\"PeriodicalId\":285760,\"journal\":{\"name\":\"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCAI57533.2023.10201281\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 3rd International Conference on Computer Communication and Artificial Intelligence (CCAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCAI57533.2023.10201281","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

游戏是强化学习的经典场景,各种标准任务和实验平台的支持是强化学习成功的原因之一。在实际环境中,通常需要考虑与其他个体的互动,例如在围棋、扑克、足球和网球游戏中,这些游戏都需要多智能体参与。本文在Google Research Football的基础上进行了多智能体调查,显示了集中训练策略的性能差距:当调用时只产生一个动作;一个统一的“超级代理”,同时产生多种动作。然后通过扰动策略参数/观测值/初始状态/合作代理来比较学习策略的敏感性。在这种情况下,PPO (Proximal Policy Optimization)方法比IMPALA具有更快的收敛速度,并且使用均值比使用max算子提供更高的集均值奖励。此外,还进行了多人博弈实验,结果表明,情节差异越小,期望收益越大。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-Agent Reinforcement Learning Investigation Based on Football Games
Games are classic scenarios for reinforcement learning, and the support of a variety of standard tasks and experimental platforms is one of the reasons for the success of reinforcement learning. In the actual environment, interacting with other individuals is often necessary to be considered, such as in games Go, Poker, football and tennis, which require multi-agent participation. This paper implemented some multi-agent investigation based on Google Research Football to show the performance gap between the centralized trained policy, which produces one action when called; and a unified “super agent”, which produces multiple actions simultaneously. Then it compared the sensitivity of the learned policies by perturbing the policy parameters / observations / initial states / cooperative agent. In this case, Proximal Policy Optimization (PPO) method has faster convergence than IMPALA, and using mean provides higher episode mean reward than using the max operator for aggregation. Besides, a multi-player experiment was also tried out, the results show that the less the episode difference is, the greater the expected return it gets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信