合作代理的最优奖励

B. Liu, Satinder Singh, Richard L. Lewis, S. Qin
{"title":"合作代理的最优奖励","authors":"B. Liu, Satinder Singh, Richard L. Lewis, S. Qin","doi":"10.1109/TAMD.2014.2362682","DOIUrl":null,"url":null,"abstract":"Following work on designing optimal rewards for single agents, we define a multiagent optimal rewards problem (ORP) in cooperative (specifically, common-payoff or team) settings. This new problem solves for individual agent reward functions that guide agents to better overall team performance relative to teams in which all agents guide their behavior with the same given team-reward function. We present a multiagent architecture in which each agent learns good reward functions from experience using a gradient-based algorithm in addition to performing the usual task of planning good policies (except in this case with respect to the learned rather than the given reward function). Multiagency introduces the challenge of nonstationarity: because the agents learn simultaneously, each agent's reward-learning problem is nonstationary and interdependent on the other agents evolving reward functions. We demonstrate on two simple domains that the proposed architecture outperforms the conventional approach in which all the agents use the same given team-reward function (even when accounting for the resource overhead of the reward learning); that the learning algorithm performs stably despite the nonstationarity; and that learning individual reward functions can lead to better specialization of roles than is possible with shared reward, whether learned or given.","PeriodicalId":49193,"journal":{"name":"IEEE Transactions on Autonomous Mental Development","volume":"25 1","pages":"286-297"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TAMD.2014.2362682","citationCount":"14","resultStr":"{\"title\":\"Optimal Rewards for Cooperative Agents\",\"authors\":\"B. Liu, Satinder Singh, Richard L. Lewis, S. Qin\",\"doi\":\"10.1109/TAMD.2014.2362682\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Following work on designing optimal rewards for single agents, we define a multiagent optimal rewards problem (ORP) in cooperative (specifically, common-payoff or team) settings. This new problem solves for individual agent reward functions that guide agents to better overall team performance relative to teams in which all agents guide their behavior with the same given team-reward function. We present a multiagent architecture in which each agent learns good reward functions from experience using a gradient-based algorithm in addition to performing the usual task of planning good policies (except in this case with respect to the learned rather than the given reward function). Multiagency introduces the challenge of nonstationarity: because the agents learn simultaneously, each agent's reward-learning problem is nonstationary and interdependent on the other agents evolving reward functions. We demonstrate on two simple domains that the proposed architecture outperforms the conventional approach in which all the agents use the same given team-reward function (even when accounting for the resource overhead of the reward learning); that the learning algorithm performs stably despite the nonstationarity; and that learning individual reward functions can lead to better specialization of roles than is possible with shared reward, whether learned or given.\",\"PeriodicalId\":49193,\"journal\":{\"name\":\"IEEE Transactions on Autonomous Mental Development\",\"volume\":\"25 1\",\"pages\":\"286-297\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/TAMD.2014.2362682\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Autonomous Mental Development\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TAMD.2014.2362682\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Autonomous Mental Development","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TAMD.2014.2362682","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

摘要

在设计单个智能体的最优奖励之后,我们定义了合作(特别是共同奖励或团队)设置中的多智能体最优奖励问题(ORP)。这个新问题解决了个体代理奖励函数,它引导代理更好的整体团队绩效,而不是所有代理都用相同的团队奖励函数指导他们的行为。我们提出了一个多智能体架构,其中每个智能体除了执行规划良好策略的常规任务外,还使用基于梯度的算法从经验中学习良好的奖励函数(除了在这种情况下,相对于学习的而不是给定的奖励函数)。多代理引入了非平稳的挑战:因为智能体同时学习,每个智能体的奖励学习问题是非平稳的,并且依赖于其他智能体进化的奖励函数。我们在两个简单的领域证明了所提出的体系结构优于传统的方法,在这种方法中,所有代理都使用相同的给定团队奖励函数(即使考虑到奖励学习的资源开销);尽管存在非平稳性,但学习算法表现稳定;学习个人奖励功能比共享奖励(无论是学习的还是给予的)更能导致角色的专业化。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Optimal Rewards for Cooperative Agents
Following work on designing optimal rewards for single agents, we define a multiagent optimal rewards problem (ORP) in cooperative (specifically, common-payoff or team) settings. This new problem solves for individual agent reward functions that guide agents to better overall team performance relative to teams in which all agents guide their behavior with the same given team-reward function. We present a multiagent architecture in which each agent learns good reward functions from experience using a gradient-based algorithm in addition to performing the usual task of planning good policies (except in this case with respect to the learned rather than the given reward function). Multiagency introduces the challenge of nonstationarity: because the agents learn simultaneously, each agent's reward-learning problem is nonstationary and interdependent on the other agents evolving reward functions. We demonstrate on two simple domains that the proposed architecture outperforms the conventional approach in which all the agents use the same given team-reward function (even when accounting for the resource overhead of the reward learning); that the learning algorithm performs stably despite the nonstationarity; and that learning individual reward functions can lead to better specialization of roles than is possible with shared reward, whether learned or given.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
IEEE Transactions on Autonomous Mental Development
IEEE Transactions on Autonomous Mental Development COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-ROBOTICS
自引率
0.00%
发文量
0
审稿时长
3 months
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信