Rapid behavior learning in multi-agent environment based on state value estimation of others

2007 IEEE/RSJ International Conference on Intelligent Robots and Systems Pub Date : 2007-12-10 DOI:10.1109/IROS.2007.4399294

Yasutake Takahashi, Kentarou Noma, M. Asada

{"title":"Rapid behavior learning in multi-agent environment based on state value estimation of others","authors":"Yasutake Takahashi, Kentarou Noma, M. Asada","doi":"10.1109/IROS.2007.4399294","DOIUrl":null,"url":null,"abstract":"The existing reinforcement learning approaches have been suffering from the curse of dimension problem when they are applied to multiagent dynamic environments. One of the typical examples is a case of RoboCup competitions since other agents and their behaviors easily cause state and action space explosion. This paper presents a method of modular learning in a multiagent environment by which the learning agent can acquire cooperative behaviors with its team mates and competitive ones against its opponents. The key ideas to resolve the issue are as follows. First, a two-layer hierarchical system with multi learning modules is adopted to reduce the size of the sensor and action spaces. The state space of the top layer consists of the state values from the lower level, and the macro actions are used to reduce the size of the physical action space. Second, the state of the other to what extent it is close to its own goal is estimated by observation and used as a state value in the top layer state space to realize the cooperative/competitive behaviors. The method is applied to 4 (defense team) on 5 (offense team) game task, and the learning agent successfully acquired the teamwork plays (pass and shoot) within much shorter learning time (30 times quicker than the earlier work).","PeriodicalId":227148,"journal":{"name":"2007 IEEE/RSJ International Conference on Intelligent Robots and Systems","volume":"434 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE/RSJ International Conference on Intelligent Robots and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IROS.2007.4399294","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The existing reinforcement learning approaches have been suffering from the curse of dimension problem when they are applied to multiagent dynamic environments. One of the typical examples is a case of RoboCup competitions since other agents and their behaviors easily cause state and action space explosion. This paper presents a method of modular learning in a multiagent environment by which the learning agent can acquire cooperative behaviors with its team mates and competitive ones against its opponents. The key ideas to resolve the issue are as follows. First, a two-layer hierarchical system with multi learning modules is adopted to reduce the size of the sensor and action spaces. The state space of the top layer consists of the state values from the lower level, and the macro actions are used to reduce the size of the physical action space. Second, the state of the other to what extent it is close to its own goal is estimated by observation and used as a state value in the top layer state space to realize the cooperative/competitive behaviors. The method is applied to 4 (defense team) on 5 (offense team) game task, and the learning agent successfully acquired the teamwork plays (pass and shoot) within much shorter learning time (30 times quicker than the earlier work).

查看原文本刊更多论文

多智能体环境下基于他人状态值估计的快速行为学习

现有的强化学习方法在应用于多智能体动态环境时，存在维数问题。一个典型的例子是RoboCup比赛，因为其他代理和他们的行为很容易导致状态和动作空间爆炸。本文提出了一种多智能体环境下的模块化学习方法，通过该方法，学习智能体可以获得与团队成员的合作行为和与对手的竞争行为。解决这个问题的关键思路如下。首先，采用多学习模块的两层分层系统来减小传感器和动作空间的大小;顶层的状态空间由下层的状态值组成，宏观动作用于减小物理动作空间的大小。其次，通过观察估计对方的状态与自己目标的接近程度，并将其作为顶层状态空间中的状态值，实现合作/竞争行为。将该方法应用于4(防守队)对5(进攻队)的比赛任务，学习智能体在较短的学习时间内(比之前的工作快30倍)成功地获得了团队比赛(传球和投篮)。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2007 IEEE/RSJ International Conference on Intelligent Robots and Systems

自引率

0.00%

发文量