Rapid behavior learning in multi-agent environment based on state value estimation of others

Yasutake Takahashi, Kentarou Noma, M. Asada
{"title":"Rapid behavior learning in multi-agent environment based on state value estimation of others","authors":"Yasutake Takahashi, Kentarou Noma, M. Asada","doi":"10.1109/IROS.2007.4399294","DOIUrl":null,"url":null,"abstract":"The existing reinforcement learning approaches have been suffering from the curse of dimension problem when they are applied to multiagent dynamic environments. One of the typical examples is a case of RoboCup competitions since other agents and their behaviors easily cause state and action space explosion. This paper presents a method of modular learning in a multiagent environment by which the learning agent can acquire cooperative behaviors with its team mates and competitive ones against its opponents. The key ideas to resolve the issue are as follows. First, a two-layer hierarchical system with multi learning modules is adopted to reduce the size of the sensor and action spaces. The state space of the top layer consists of the state values from the lower level, and the macro actions are used to reduce the size of the physical action space. Second, the state of the other to what extent it is close to its own goal is estimated by observation and used as a state value in the top layer state space to realize the cooperative/competitive behaviors. The method is applied to 4 (defense team) on 5 (offense team) game task, and the learning agent successfully acquired the teamwork plays (pass and shoot) within much shorter learning time (30 times quicker than the earlier work).","PeriodicalId":227148,"journal":{"name":"2007 IEEE/RSJ International Conference on Intelligent Robots and Systems","volume":"434 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE/RSJ International Conference on Intelligent Robots and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IROS.2007.4399294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The existing reinforcement learning approaches have been suffering from the curse of dimension problem when they are applied to multiagent dynamic environments. One of the typical examples is a case of RoboCup competitions since other agents and their behaviors easily cause state and action space explosion. This paper presents a method of modular learning in a multiagent environment by which the learning agent can acquire cooperative behaviors with its team mates and competitive ones against its opponents. The key ideas to resolve the issue are as follows. First, a two-layer hierarchical system with multi learning modules is adopted to reduce the size of the sensor and action spaces. The state space of the top layer consists of the state values from the lower level, and the macro actions are used to reduce the size of the physical action space. Second, the state of the other to what extent it is close to its own goal is estimated by observation and used as a state value in the top layer state space to realize the cooperative/competitive behaviors. The method is applied to 4 (defense team) on 5 (offense team) game task, and the learning agent successfully acquired the teamwork plays (pass and shoot) within much shorter learning time (30 times quicker than the earlier work).
多智能体环境下基于他人状态值估计的快速行为学习
现有的强化学习方法在应用于多智能体动态环境时,存在维数问题。一个典型的例子是RoboCup比赛,因为其他代理和他们的行为很容易导致状态和动作空间爆炸。本文提出了一种多智能体环境下的模块化学习方法,通过该方法,学习智能体可以获得与团队成员的合作行为和与对手的竞争行为。解决这个问题的关键思路如下。首先,采用多学习模块的两层分层系统来减小传感器和动作空间的大小;顶层的状态空间由下层的状态值组成,宏观动作用于减小物理动作空间的大小。其次,通过观察估计对方的状态与自己目标的接近程度,并将其作为顶层状态空间中的状态值,实现合作/竞争行为。将该方法应用于4(防守队)对5(进攻队)的比赛任务,学习智能体在较短的学习时间内(比之前的工作快30倍)成功地获得了团队比赛(传球和投篮)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信