Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space

arXiv - CS - Multiagent Systems Pub Date : 2024-08-14 DOI:arxiv-2408.07395

Xiaoyang Yu, Youfang Lin, Shuo Wang, Kai Lv, Sheng Han

{"title":"Improving Global Parameter-sharing in Physically Heterogeneous Multi-agent Reinforcement Learning with Unified Action Space","authors":"Xiaoyang Yu, Youfang Lin, Shuo Wang, Kai Lv, Sheng Han","doi":"arxiv-2408.07395","DOIUrl":null,"url":null,"abstract":"In a multi-agent system (MAS), action semantics indicates the different\ninfluences of agents' actions toward other entities, and can be used to divide\nagents into groups in a physically heterogeneous MAS. Previous multi-agent\nreinforcement learning (MARL) algorithms apply global parameter-sharing across\ndifferent types of heterogeneous agents without careful discrimination of\ndifferent action semantics. This common implementation decreases the\ncooperation and coordination between agents in complex situations. However,\nfully independent agent parameters dramatically increase the computational cost\nand training difficulty. In order to benefit from the usage of different action\nsemantics while also maintaining a proper parameter-sharing structure, we\nintroduce the Unified Action Space (UAS) to fulfill the requirement. The UAS is\nthe union set of all agent actions with different semantics. All agents first\ncalculate their unified representation in the UAS, and then generate their\nheterogeneous action policies using different available-action-masks. To\nfurther improve the training of extra UAS parameters, we introduce a\nCross-Group Inverse (CGI) loss to predict other groups' agent policies with the\ntrajectory information. As a universal method for solving the physically\nheterogeneous MARL problem, we implement the UAS adding to both value-based and\npolicy-based MARL algorithms, and propose two practical algorithms: U-QMIX and\nU-MAPPO. Experimental results in the SMAC environment prove the effectiveness\nof both U-QMIX and U-MAPPO compared with several state-of-the-art MARL methods.","PeriodicalId":501315,"journal":{"name":"arXiv - CS - Multiagent Systems","volume":"19 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Multiagent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.07395","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

In a multi-agent system (MAS), action semantics indicates the different influences of agents' actions toward other entities, and can be used to divide agents into groups in a physically heterogeneous MAS. Previous multi-agent reinforcement learning (MARL) algorithms apply global parameter-sharing across different types of heterogeneous agents without careful discrimination of different action semantics. This common implementation decreases the cooperation and coordination between agents in complex situations. However, fully independent agent parameters dramatically increase the computational cost and training difficulty. In order to benefit from the usage of different action semantics while also maintaining a proper parameter-sharing structure, we introduce the Unified Action Space (UAS) to fulfill the requirement. The UAS is the union set of all agent actions with different semantics. All agents first calculate their unified representation in the UAS, and then generate their heterogeneous action policies using different available-action-masks. To further improve the training of extra UAS parameters, we introduce a Cross-Group Inverse (CGI) loss to predict other groups' agent policies with the trajectory information. As a universal method for solving the physically heterogeneous MARL problem, we implement the UAS adding to both value-based and policy-based MARL algorithms, and propose two practical algorithms: U-QMIX and U-MAPPO. Experimental results in the SMAC environment prove the effectiveness of both U-QMIX and U-MAPPO compared with several state-of-the-art MARL methods.

查看原文本刊更多论文

利用统一行动空间改进物理异构多代理强化学习中的全局参数共享

在多代理系统（MAS）中，行动语义表示代理的行动对其他实体的不同影响，可用于将物理异构 MAS 中的代理分成不同的组。以前的多代理强化学习（MARL）算法在不同类型的异构代理之间应用全局参数共享，而没有仔细区分不同的行动语义。这种常见的实现方式降低了复杂情况下代理之间的合作与协调。然而，完全独立的代理参数大大增加了计算成本和训练难度。为了从不同行动语义的使用中获益，同时保持适当的参数共享结构，我们引入了统一行动空间（UAS）来满足这一要求。统一行动空间是所有具有不同语义的代理行动的联合集。所有代理首先计算它们在 UAS 中的统一表示，然后使用不同的可用行动掩码生成它们的异构行动策略。为了进一步改进 UAS 额外参数的训练，我们引入了跨组反演（CGI）损失，利用轨迹信息预测其他组的代理策略。作为解决物理异构 MARL 问题的通用方法，我们将 UAS 添加到基于值和基于策略的 MARL 算法中，并提出了两种实用算法：U-QMIX 和 U-MAPPO。在 SMAC 环境中的实验结果证明，与几种最先进的 MARL 方法相比，U-QMIX 和 U-MAPPO 都很有效。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

arXiv - CS - Multiagent Systems

自引率

0.00%

发文量