基于子任务间授权奖励的空间域任务多智能体强化学习

Shubham Pateria, Budhitama Subagdja, A. Tan
{"title":"基于子任务间授权奖励的空间域任务多智能体强化学习","authors":"Shubham Pateria, Budhitama Subagdja, A. Tan","doi":"10.1109/SSCI44817.2019.9002777","DOIUrl":null,"url":null,"abstract":"In the complex multi-agent tasks, various agents must cooperate to distribute relevant subtasks among each other to achieve joint task objectives. An agent’s choice of the relevant subtask changes over time with the changes in the task environment state. Multi-agent Hierarchical Reinforcement Learning (MAHRL) provides an approach for learning to select the subtasks in response to the environment states, by using the joint task rewards to train various agents. When the joint task involves complex inter-agent dependencies, only a subset of agents might be capable of reaching the rewarding task states while other agents take precursory or intermediate roles. The delayed task reward might not be sufficient in such tasks to learn the coordinating policies for various agents. In this paper, we introduce a novel approach of MAHRL called Inter-Subtask Empowerment based Multi-agent Options (ISEMO) in which an Inter-Subtask Empowerment Reward (ISER) is given to an agent which enables the precondition(s) of other agents’ subtasks. ISER is given in addition to the domain task reward in order to improve the inter-agent coordination. ISEMO also incorporates options model that can learn parameterized subtask termination functions and relax the limitations posed by hand-crafted termination conditions. Experiments in a spatial Search and Rescue domain show that ISEMO can learn the subtask selection policies of various agents grounded in the inter-dependencies among the agents, as well as learn the subtask termination conditions, and perform better than the standard MAHRL technique.","PeriodicalId":6729,"journal":{"name":"2019 IEEE Symposium Series on Computational Intelligence (SSCI)","volume":"669 1","pages":"86-93"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Multi-agent Reinforcement Learning in Spatial Domain Tasks using Inter Subtask Empowerment Rewards\",\"authors\":\"Shubham Pateria, Budhitama Subagdja, A. Tan\",\"doi\":\"10.1109/SSCI44817.2019.9002777\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the complex multi-agent tasks, various agents must cooperate to distribute relevant subtasks among each other to achieve joint task objectives. An agent’s choice of the relevant subtask changes over time with the changes in the task environment state. Multi-agent Hierarchical Reinforcement Learning (MAHRL) provides an approach for learning to select the subtasks in response to the environment states, by using the joint task rewards to train various agents. When the joint task involves complex inter-agent dependencies, only a subset of agents might be capable of reaching the rewarding task states while other agents take precursory or intermediate roles. The delayed task reward might not be sufficient in such tasks to learn the coordinating policies for various agents. In this paper, we introduce a novel approach of MAHRL called Inter-Subtask Empowerment based Multi-agent Options (ISEMO) in which an Inter-Subtask Empowerment Reward (ISER) is given to an agent which enables the precondition(s) of other agents’ subtasks. ISER is given in addition to the domain task reward in order to improve the inter-agent coordination. ISEMO also incorporates options model that can learn parameterized subtask termination functions and relax the limitations posed by hand-crafted termination conditions. Experiments in a spatial Search and Rescue domain show that ISEMO can learn the subtask selection policies of various agents grounded in the inter-dependencies among the agents, as well as learn the subtask termination conditions, and perform better than the standard MAHRL technique.\",\"PeriodicalId\":6729,\"journal\":{\"name\":\"2019 IEEE Symposium Series on Computational Intelligence (SSCI)\",\"volume\":\"669 1\",\"pages\":\"86-93\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Symposium Series on Computational Intelligence (SSCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSCI44817.2019.9002777\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Symposium Series on Computational Intelligence (SSCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSCI44817.2019.9002777","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在复杂的多智能体任务中,各个智能体必须相互协作,将相关的子任务分配到彼此之间,以实现共同的任务目标。代理对相关子任务的选择随着任务环境状态的变化而变化。多智能体分层强化学习(MAHRL)提供了一种学习方法,通过使用联合任务奖励来训练各种智能体,以响应环境状态选择子任务。当联合任务涉及复杂的智能体间依赖关系时,可能只有一小部分智能体能够达到奖励任务状态,而其他智能体则扮演前置或中间角色。在这些任务中,延迟的任务奖励可能不足以学习各种代理的协调策略。在本文中,我们引入了一种新的MAHRL方法,称为基于子任务间授权的多代理选项(ISEMO),其中子任务间授权奖励(ISER)给予一个代理,使其他代理的子任务成为先决条件。为了提高智能体间的协调能力,在领域任务奖励的基础上增加了ISER。ISEMO还集成了选项模型,可以学习参数化的子任务终止函数,并放宽了手工制作的终止条件所带来的限制。在空间搜索与救援领域的实验表明,ISEMO可以根据agent之间的相互依赖关系学习各种agent的子任务选择策略,并学习子任务的终止条件,性能优于标准的MAHRL技术。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Multi-agent Reinforcement Learning in Spatial Domain Tasks using Inter Subtask Empowerment Rewards
In the complex multi-agent tasks, various agents must cooperate to distribute relevant subtasks among each other to achieve joint task objectives. An agent’s choice of the relevant subtask changes over time with the changes in the task environment state. Multi-agent Hierarchical Reinforcement Learning (MAHRL) provides an approach for learning to select the subtasks in response to the environment states, by using the joint task rewards to train various agents. When the joint task involves complex inter-agent dependencies, only a subset of agents might be capable of reaching the rewarding task states while other agents take precursory or intermediate roles. The delayed task reward might not be sufficient in such tasks to learn the coordinating policies for various agents. In this paper, we introduce a novel approach of MAHRL called Inter-Subtask Empowerment based Multi-agent Options (ISEMO) in which an Inter-Subtask Empowerment Reward (ISER) is given to an agent which enables the precondition(s) of other agents’ subtasks. ISER is given in addition to the domain task reward in order to improve the inter-agent coordination. ISEMO also incorporates options model that can learn parameterized subtask termination functions and relax the limitations posed by hand-crafted termination conditions. Experiments in a spatial Search and Rescue domain show that ISEMO can learn the subtask selection policies of various agents grounded in the inter-dependencies among the agents, as well as learn the subtask termination conditions, and perform better than the standard MAHRL technique.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信