{"title":"Mutual information oriented deep skill chaining for multi-agent reinforcement learning","authors":"Zaipeng Xie, Cheng Ji, Chentai Qiao, WenZhan Song, Zewen Li, Yufeng Zhang, Yujing Zhang","doi":"10.1049/cit2.12322","DOIUrl":null,"url":null,"abstract":"<p>Multi-agent reinforcement learning relies on reward signals to guide the policy networks of individual agents. However, in high-dimensional continuous spaces, the non-stationary environment can provide outdated experiences that hinder convergence, resulting in ineffective training performance for multi-agent systems. To tackle this issue, a novel reinforcement learning scheme, Mutual Information Oriented Deep Skill Chaining (MioDSC), is proposed that generates an optimised cooperative policy by incorporating intrinsic rewards based on mutual information to improve exploration efficiency. These rewards encourage agents to diversify their learning process by engaging in actions that increase the mutual information between their actions and the environment state. In addition, MioDSC can generate cooperative policies using the options framework, allowing agents to learn and reuse complex action sequences and accelerating the convergence speed of multi-agent learning. MioDSC was evaluated in the multi-agent particle environment and the StarCraft multi-agent challenge at varying difficulty levels. The experimental results demonstrate that MioDSC outperforms state-of-the-art methods and is robust across various multi-agent system tasks with high stability.</p>","PeriodicalId":46211,"journal":{"name":"CAAI Transactions on Intelligence Technology","volume":"9 4","pages":"1014-1030"},"PeriodicalIF":8.4000,"publicationDate":"2024-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1049/cit2.12322","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"CAAI Transactions on Intelligence Technology","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1049/cit2.12322","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Multi-agent reinforcement learning relies on reward signals to guide the policy networks of individual agents. However, in high-dimensional continuous spaces, the non-stationary environment can provide outdated experiences that hinder convergence, resulting in ineffective training performance for multi-agent systems. To tackle this issue, a novel reinforcement learning scheme, Mutual Information Oriented Deep Skill Chaining (MioDSC), is proposed that generates an optimised cooperative policy by incorporating intrinsic rewards based on mutual information to improve exploration efficiency. These rewards encourage agents to diversify their learning process by engaging in actions that increase the mutual information between their actions and the environment state. In addition, MioDSC can generate cooperative policies using the options framework, allowing agents to learn and reuse complex action sequences and accelerating the convergence speed of multi-agent learning. MioDSC was evaluated in the multi-agent particle environment and the StarCraft multi-agent challenge at varying difficulty levels. The experimental results demonstrate that MioDSC outperforms state-of-the-art methods and is robust across various multi-agent system tasks with high stability.
期刊介绍:
CAAI Transactions on Intelligence Technology is a leading venue for original research on the theoretical and experimental aspects of artificial intelligence technology. We are a fully open access journal co-published by the Institution of Engineering and Technology (IET) and the Chinese Association for Artificial Intelligence (CAAI) providing research which is openly accessible to read and share worldwide.