合作代理的最优奖励

IEEE Transactions on Autonomous Mental Development Pub Date : 2014-10-13 DOI:10.1109/TAMD.2014.2362682

B. Liu, Satinder Singh, Richard L. Lewis, S. Qin

{"title":"合作代理的最优奖励","authors":"B. Liu, Satinder Singh, Richard L. Lewis, S. Qin","doi":"10.1109/TAMD.2014.2362682","DOIUrl":null,"url":null,"abstract":"Following work on designing optimal rewards for single agents, we define a multiagent optimal rewards problem (ORP) in cooperative (specifically, common-payoff or team) settings. This new problem solves for individual agent reward functions that guide agents to better overall team performance relative to teams in which all agents guide their behavior with the same given team-reward function. We present a multiagent architecture in which each agent learns good reward functions from experience using a gradient-based algorithm in addition to performing the usual task of planning good policies (except in this case with respect to the learned rather than the given reward function). Multiagency introduces the challenge of nonstationarity: because the agents learn simultaneously, each agent's reward-learning problem is nonstationary and interdependent on the other agents evolving reward functions. We demonstrate on two simple domains that the proposed architecture outperforms the conventional approach in which all the agents use the same given team-reward function (even when accounting for the resource overhead of the reward learning); that the learning algorithm performs stably despite the nonstationarity; and that learning individual reward functions can lead to better specialization of roles than is possible with shared reward, whether learned or given.","PeriodicalId":49193,"journal":{"name":"IEEE Transactions on Autonomous Mental Development","volume":"25 1","pages":"286-297"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1109/TAMD.2014.2362682","citationCount":"14","resultStr":"{\"title\":\"Optimal Rewards for Cooperative Agents\",\"authors\":\"B. Liu, Satinder Singh, Richard L. Lewis, S. Qin\",\"doi\":\"10.1109/TAMD.2014.2362682\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Following work on designing optimal rewards for single agents, we define a multiagent optimal rewards problem (ORP) in cooperative (specifically, common-payoff or team) settings. This new problem solves for individual agent reward functions that guide agents to better overall team performance relative to teams in which all agents guide their behavior with the same given team-reward function. We present a multiagent architecture in which each agent learns good reward functions from experience using a gradient-based algorithm in addition to performing the usual task of planning good policies (except in this case with respect to the learned rather than the given reward function). Multiagency introduces the challenge of nonstationarity: because the agents learn simultaneously, each agent's reward-learning problem is nonstationary and interdependent on the other agents evolving reward functions. We demonstrate on two simple domains that the proposed architecture outperforms the conventional approach in which all the agents use the same given team-reward function (even when accounting for the resource overhead of the reward learning); that the learning algorithm performs stably despite the nonstationarity; and that learning individual reward functions can lead to better specialization of roles than is possible with shared reward, whether learned or given.\",\"PeriodicalId\":49193,\"journal\":{\"name\":\"IEEE Transactions on Autonomous Mental Development\",\"volume\":\"25 1\",\"pages\":\"286-297\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1109/TAMD.2014.2362682\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Autonomous Mental Development\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/TAMD.2014.2362682\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Autonomous Mental Development","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/TAMD.2014.2362682","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

在设计单个智能体的最优奖励之后，我们定义了合作(特别是共同奖励或团队)设置中的多智能体最优奖励问题(ORP)。这个新问题解决了个体代理奖励函数，它引导代理更好的整体团队绩效，而不是所有代理都用相同的团队奖励函数指导他们的行为。我们提出了一个多智能体架构，其中每个智能体除了执行规划良好策略的常规任务外，还使用基于梯度的算法从经验中学习良好的奖励函数(除了在这种情况下，相对于学习的而不是给定的奖励函数)。多代理引入了非平稳的挑战:因为智能体同时学习，每个智能体的奖励学习问题是非平稳的，并且依赖于其他智能体进化的奖励函数。我们在两个简单的领域证明了所提出的体系结构优于传统的方法，在这种方法中，所有代理都使用相同的给定团队奖励函数(即使考虑到奖励学习的资源开销);尽管存在非平稳性，但学习算法表现稳定;学习个人奖励功能比共享奖励(无论是学习的还是给予的)更能导致角色的专业化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Optimal Rewards for Cooperative Agents

Following work on designing optimal rewards for single agents, we define a multiagent optimal rewards problem (ORP) in cooperative (specifically, common-payoff or team) settings. This new problem solves for individual agent reward functions that guide agents to better overall team performance relative to teams in which all agents guide their behavior with the same given team-reward function. We present a multiagent architecture in which each agent learns good reward functions from experience using a gradient-based algorithm in addition to performing the usual task of planning good policies (except in this case with respect to the learned rather than the given reward function). Multiagency introduces the challenge of nonstationarity: because the agents learn simultaneously, each agent's reward-learning problem is nonstationary and interdependent on the other agents evolving reward functions. We demonstrate on two simple domains that the proposed architecture outperforms the conventional approach in which all the agents use the same given team-reward function (even when accounting for the resource overhead of the reward learning); that the learning algorithm performs stably despite the nonstationarity; and that learning individual reward functions can lead to better specialization of roles than is possible with shared reward, whether learned or given.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Autonomous Mental Development COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-ROBOTICS

自引率

0.00%

发文量

审稿时长

3 months