预算约束下多智能体强化学习的q值共享框架

ACM Transactions on Autonomous and Adaptive Systems (TAAS) Pub Date : 2020-11-29 DOI:10.1145/3447268

Changxi Zhu, Ho-fung Leung, Shuyue Hu, Yi Cai

{"title":"预算约束下多智能体强化学习的q值共享框架","authors":"Changxi Zhu, Ho-fung Leung, Shuyue Hu, Yi Cai","doi":"10.1145/3447268","DOIUrl":null,"url":null,"abstract":"In a teacher-student framework, a more experienced agent (teacher) helps accelerate the learning of another agent (student) by suggesting actions to take in certain states. In cooperative multi-agent reinforcement learning (MARL), where agents must cooperate with one another, a student could fail to cooperate effectively with others even by following a teacher’s suggested actions, as the policies of all agents can change before convergence. When the number of times that agents communicate with one another is limited (i.e., there are budget constraints), an advising strategy that uses actions as advice could be less effective. We propose a partaker-sharer advising framework (PSAF) for cooperative MARL agents learning with budget constraints. In PSAF, each Q-learner can decide when to ask for and share its Q-values. We perform experiments in three typical multi-agent learning problems. The evaluation results indicate that the proposed PSAF approach outperforms existing advising methods under both constrained and unconstrained budgets. Moreover, we analyse the influence of advising actions and sharing Q-values on agent learning.","PeriodicalId":377078,"journal":{"name":"ACM Transactions on Autonomous and Adaptive Systems (TAAS)","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"A Q-values Sharing Framework for Multi-agent Reinforcement Learning under Budget Constraint\",\"authors\":\"Changxi Zhu, Ho-fung Leung, Shuyue Hu, Yi Cai\",\"doi\":\"10.1145/3447268\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In a teacher-student framework, a more experienced agent (teacher) helps accelerate the learning of another agent (student) by suggesting actions to take in certain states. In cooperative multi-agent reinforcement learning (MARL), where agents must cooperate with one another, a student could fail to cooperate effectively with others even by following a teacher’s suggested actions, as the policies of all agents can change before convergence. When the number of times that agents communicate with one another is limited (i.e., there are budget constraints), an advising strategy that uses actions as advice could be less effective. We propose a partaker-sharer advising framework (PSAF) for cooperative MARL agents learning with budget constraints. In PSAF, each Q-learner can decide when to ask for and share its Q-values. We perform experiments in three typical multi-agent learning problems. The evaluation results indicate that the proposed PSAF approach outperforms existing advising methods under both constrained and unconstrained budgets. Moreover, we analyse the influence of advising actions and sharing Q-values on agent learning.\",\"PeriodicalId\":377078,\"journal\":{\"name\":\"ACM Transactions on Autonomous and Adaptive Systems (TAAS)\",\"volume\":\"3 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Autonomous and Adaptive Systems (TAAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3447268\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Autonomous and Adaptive Systems (TAAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3447268","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

在师生框架中，更有经验的主体(教师)通过建议在特定状态下采取的行动来帮助加速另一个主体(学生)的学习。在协作式多智能体强化学习(MARL)中，智能体必须相互合作，学生即使遵循老师的建议也可能无法有效地与他人合作，因为所有智能体的策略在收敛之前都可能发生变化。当代理相互通信的次数有限时(例如，存在预算约束)，使用动作作为建议的建议策略可能不太有效。我们提出了一个参与者-共享建议框架(PSAF)，用于具有预算约束的MARL智能体合作学习。在PSAF中，每个q学习者可以决定何时请求和共享它的q值。我们对三个典型的多智能体学习问题进行了实验。评价结果表明，所提出的PSAF方法在预算约束和不约束条件下都优于现有的建议方法。此外，我们还分析了建议行为和共享q值对智能体学习的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A Q-values Sharing Framework for Multi-agent Reinforcement Learning under Budget Constraint

In a teacher-student framework, a more experienced agent (teacher) helps accelerate the learning of another agent (student) by suggesting actions to take in certain states. In cooperative multi-agent reinforcement learning (MARL), where agents must cooperate with one another, a student could fail to cooperate effectively with others even by following a teacher’s suggested actions, as the policies of all agents can change before convergence. When the number of times that agents communicate with one another is limited (i.e., there are budget constraints), an advising strategy that uses actions as advice could be less effective. We propose a partaker-sharer advising framework (PSAF) for cooperative MARL agents learning with budget constraints. In PSAF, each Q-learner can decide when to ask for and share its Q-values. We perform experiments in three typical multi-agent learning problems. The evaluation results indicate that the proposed PSAF approach outperforms existing advising methods under both constrained and unconstrained budgets. Moreover, we analyse the influence of advising actions and sharing Q-values on agent learning.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Autonomous and Adaptive Systems (TAAS)

自引率

0.00%

发文量