MACRPO: Multi-agent cooperative recurrent policy optimization.

IF 2.9 Q2 ROBOTICS

Frontiers in Robotics and AI Pub Date : 2024-12-20 eCollection Date: 2024-01-01 DOI:10.3389/frobt.2024.1394209

Eshagh Kargar, Ville Kyrki

{"title":"MACRPO: Multi-agent cooperative recurrent policy optimization.","authors":"Eshagh Kargar, Ville Kyrki","doi":"10.3389/frobt.2024.1394209","DOIUrl":null,"url":null,"abstract":"This work considers the problem of learning cooperative policies in multi-agent settings with partially observable and non-stationary environments without a communication channel. We focus on improving information sharing between agents and propose a new multi-agent actor-critic method called Multi-Agent Cooperative Recurrent Proximal Policy Optimization (MACRPO). We propose two novel ways of integrating information across agents and time in MACRPO: First, we use a recurrent layer in the critic's network architecture and propose a new framework to use the proposed meta-trajectory to train the recurrent layer. This allows the network to learn the cooperation and dynamics of interactions between agents, and also handle partial observability. Second, we propose a new advantage function that incorporates other agents' rewards and value functions by controlling the level of cooperation between agents using a parameter. The use of this control parameter is suitable for environments in which the agents are unable to fully cooperate with each other. We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces, Deepdrive-Zero, Multi-Walker, and Particle environment. We compare the results with several ablations and state-of-the-art multi-agent algorithms such as MAGIC, IC3Net, CommNet, GA-Comm, QMIX, MADDPG, and RMAPPO, and also single-agent methods with shared parameters between agents such as IMPALA and APEX. The results show superior performance against other algorithms. The code is available online at https://github.com/kargarisaac/macrpo.","PeriodicalId":47597,"journal":{"name":"Frontiers in Robotics and AI","volume":"11 ","pages":"1394209"},"PeriodicalIF":2.9000,"publicationDate":"2024-12-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11695781/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Robotics and AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frobt.2024.1394209","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"ROBOTICS","Score":null,"Total":0}

引用次数: 0

Abstract

This work considers the problem of learning cooperative policies in multi-agent settings with partially observable and non-stationary environments without a communication channel. We focus on improving information sharing between agents and propose a new multi-agent actor-critic method called Multi-Agent Cooperative Recurrent Proximal Policy Optimization (MACRPO). We propose two novel ways of integrating information across agents and time in MACRPO: First, we use a recurrent layer in the critic's network architecture and propose a new framework to use the proposed meta-trajectory to train the recurrent layer. This allows the network to learn the cooperation and dynamics of interactions between agents, and also handle partial observability. Second, we propose a new advantage function that incorporates other agents' rewards and value functions by controlling the level of cooperation between agents using a parameter. The use of this control parameter is suitable for environments in which the agents are unable to fully cooperate with each other. We evaluate our algorithm on three challenging multi-agent environments with continuous and discrete action spaces, Deepdrive-Zero, Multi-Walker, and Particle environment. We compare the results with several ablations and state-of-the-art multi-agent algorithms such as MAGIC, IC3Net, CommNet, GA-Comm, QMIX, MADDPG, and RMAPPO, and also single-agent methods with shared parameters between agents such as IMPALA and APEX. The results show superior performance against other algorithms. The code is available online at https://github.com/kargarisaac/macrpo.

查看原文本刊更多论文

MACRPO：多智能体合作循环策略优化。

这项工作考虑了在没有通信通道的部分可观察和非固定环境下的多智能体设置中学习合作策略的问题。针对智能体之间的信息共享问题，提出了一种新的多智能体行为评价方法——多智能体合作递归近端策略优化（MACRPO）。我们提出了在MACRPO中跨代理和时间集成信息的两种新方法：首先，我们在评论家的网络架构中使用循环层，并提出了一个新的框架，使用所提出的元轨迹来训练循环层。这使得网络可以学习智能体之间的合作和动态交互，并处理部分可观察性。其次，我们提出了一个新的优势函数，通过使用参数控制代理之间的合作水平，将其他代理的奖励和价值函数结合起来。这个控制参数的使用适用于agent不能完全相互协作的环境。我们在三个具有连续和离散动作空间的具有挑战性的多智能体环境，Deepdrive-Zero， Multi-Walker和Particle环境中评估了我们的算法。我们将结果与几种烧烧和最先进的多智能体算法（如MAGIC、IC3Net、CommNet、GA-Comm、QMIX、MADDPG和RMAPPO）以及在智能体之间共享参数的单智能体方法（如IMPALA和APEX）进行了比较。结果表明，该算法具有较好的性能。该代码可在https://github.com/kargarisaac/macrpo上在线获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Frontiers in Robotics and AI ROBOTICS-

CiteScore

6.50

自引率

5.90%

发文量

355

审稿时长

14 weeks

期刊介绍： Frontiers in Robotics and AI publishes rigorously peer-reviewed research covering all theory and applications of robotics, technology, and artificial intelligence, from biomedical to space robotics.