Learning from Multiple Independent Advisors in Multi-agent Reinforcement Learning

Adaptive Agents and Multi-Agent Systems Pub Date : 2023-01-26 DOI:10.48550/arXiv.2301.11153

Sriram Ganapathi Subramanian, Matthew E. Taylor, K. Larson, Mark Crowley

{"title":"Learning from Multiple Independent Advisors in Multi-agent Reinforcement Learning","authors":"Sriram Ganapathi Subramanian, Matthew E. Taylor, K. Larson, Mark Crowley","doi":"10.48550/arXiv.2301.11153","DOIUrl":null,"url":null,"abstract":"Multi-agent reinforcement learning typically suffers from the problem of sample inefficiency, where learning suitable policies involves the use of many data samples. Learning from external demonstrators is a possible solution that mitigates this problem. However, most prior approaches in this area assume the presence of a single demonstrator. Leveraging multiple knowledge sources (i.e., advisors) with expertise in distinct aspects of the environment could substantially speed up learning in complex environments. This paper considers the problem of simultaneously learning from multiple independent advisors in multi-agent reinforcement learning. The approach leverages a two-level Q-learning architecture, and extends this framework from single-agent to multi-agent settings. We provide principled algorithms that incorporate a set of advisors by both evaluating the advisors at each state and subsequently using the advisors to guide action selection. We also provide theoretical convergence and sample complexity guarantees. Experimentally, we validate our approach in three different test-beds and show that our algorithms give better performances than baselines, can effectively integrate the combined expertise of different advisors, and learn to ignore bad advice.","PeriodicalId":326727,"journal":{"name":"Adaptive Agents and Multi-Agent Systems","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Adaptive Agents and Multi-Agent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2301.11153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-agent reinforcement learning typically suffers from the problem of sample inefficiency, where learning suitable policies involves the use of many data samples. Learning from external demonstrators is a possible solution that mitigates this problem. However, most prior approaches in this area assume the presence of a single demonstrator. Leveraging multiple knowledge sources (i.e., advisors) with expertise in distinct aspects of the environment could substantially speed up learning in complex environments. This paper considers the problem of simultaneously learning from multiple independent advisors in multi-agent reinforcement learning. The approach leverages a two-level Q-learning architecture, and extends this framework from single-agent to multi-agent settings. We provide principled algorithms that incorporate a set of advisors by both evaluating the advisors at each state and subsequently using the advisors to guide action selection. We also provide theoretical convergence and sample complexity guarantees. Experimentally, we validate our approach in three different test-beds and show that our algorithms give better performances than baselines, can effectively integrate the combined expertise of different advisors, and learn to ignore bad advice.

查看原文本刊更多论文

多智能体强化学习中多个独立顾问的学习

多智能体强化学习通常存在样本效率低下的问题，其中学习合适的策略涉及使用许多数据样本。向外部示范人员学习是缓解这一问题的可能解决方案。然而，该领域的大多数先前方法都假定存在单个演示者。利用具有环境不同方面专业知识的多个知识来源(例如，顾问)可以大大加快复杂环境中的学习速度。本文研究了多智能体强化学习中同时向多个独立顾问学习的问题。该方法利用两级q学习架构，并将该框架从单智能体扩展到多智能体设置。我们提供有原则的算法，通过评估每个状态的顾问，并随后使用顾问来指导行动选择，将一组顾问合并在一起。我们还提供了理论收敛性和样本复杂度保证。实验中，我们在三个不同的测试平台上验证了我们的方法，并表明我们的算法比基线提供了更好的性能，可以有效地整合不同顾问的综合专业知识，并学会忽略糟糕的建议。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Adaptive Agents and Multi-Agent Systems

自引率

0.00%

发文量