AIIR-MIX: Multi-Agent Reinforcement Learning Meets Attention Individual Intrinsic Reward Mixing Network

Asian Conference on Machine Learning Pub Date : 2023-02-19 DOI:10.48550/arXiv.2302.09531

Wei Li, Weiyan Liu, Shitong Shao, Shiyi Huang

{"title":"AIIR-MIX: Multi-Agent Reinforcement Learning Meets Attention Individual Intrinsic Reward Mixing Network","authors":"Wei Li, Weiyan Liu, Shitong Shao, Shiyi Huang","doi":"10.48550/arXiv.2302.09531","DOIUrl":null,"url":null,"abstract":"Deducing the contribution of each agent and assigning the corresponding reward to them is a crucial problem in cooperative Multi-Agent Reinforcement Learning (MARL). Previous studies try to resolve the issue through designing an intrinsic reward function, but the intrinsic reward is simply combined with the environment reward by summation in these studies, which makes the performance of their MARL framework unsatisfactory. We propose a novel method named Attention Individual Intrinsic Reward Mixing Network (AIIR-MIX) in MARL, and the contributions of AIIR-MIX are listed as follows:(a) we construct a novel intrinsic reward network based on the attention mechanism to make teamwork more effective. (b) we propose a Mixing network that is able to combine intrinsic and extrinsic rewards non-linearly and dynamically in response to changing conditions of the environment. We compare AIIR-MIX with many State-Of-The-Art (SOTA) MARL methods on battle games in StarCraft II. And the results demonstrate that AIIR-MIX performs admirably and can defeat the current advanced methods on average test win rate. To validate the effectiveness of AIIR-MIX, we conduct additional ablation studies. The results show that AIIR-MIX can dynamically assign each agent a real-time intrinsic reward in accordance with their actual contribution.","PeriodicalId":119756,"journal":{"name":"Asian Conference on Machine Learning","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Asian Conference on Machine Learning","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2302.09531","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Deducing the contribution of each agent and assigning the corresponding reward to them is a crucial problem in cooperative Multi-Agent Reinforcement Learning (MARL). Previous studies try to resolve the issue through designing an intrinsic reward function, but the intrinsic reward is simply combined with the environment reward by summation in these studies, which makes the performance of their MARL framework unsatisfactory. We propose a novel method named Attention Individual Intrinsic Reward Mixing Network (AIIR-MIX) in MARL, and the contributions of AIIR-MIX are listed as follows:(a) we construct a novel intrinsic reward network based on the attention mechanism to make teamwork more effective. (b) we propose a Mixing network that is able to combine intrinsic and extrinsic rewards non-linearly and dynamically in response to changing conditions of the environment. We compare AIIR-MIX with many State-Of-The-Art (SOTA) MARL methods on battle games in StarCraft II. And the results demonstrate that AIIR-MIX performs admirably and can defeat the current advanced methods on average test win rate. To validate the effectiveness of AIIR-MIX, we conduct additional ablation studies. The results show that AIIR-MIX can dynamically assign each agent a real-time intrinsic reward in accordance with their actual contribution.

查看原文本刊更多论文

air - mix:多智能体强化学习满足注意个体内在奖励混合网络

在协作式多智能体强化学习(MARL)中，计算每个智能体的贡献并给予相应的奖励是一个关键问题。以往的研究都试图通过设计一个内在奖励函数来解决这个问题，但这些研究都是简单地将内在奖励与环境奖励进行求和，这使得他们的MARL框架的性能不理想。我们在MARL中提出了一种新颖的方法——注意个体内在奖励混合网络(AIIR-MIX)， AIIR-MIX的贡献如下:(a)基于注意机制构建了一种新颖的内在奖励网络，使团队合作更有效。(b)我们提出了一种混合网络，它能够非线性和动态地结合内在和外在奖励，以响应不断变化的环境条件。我们将air - mix与许多最先进的(SOTA) MARL方法在星际争霸II的战斗游戏中进行了比较。实验结果表明，AIIR-MIX具有良好的性能，在平均测试胜率上优于现有的先进方法。为了验证air - mix的有效性，我们进行了额外的消融研究。结果表明，AIIR-MIX可以根据每个智能体的实际贡献动态分配实时的内在奖励。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Asian Conference on Machine Learning

自引率

0.00%

发文量