{"title":"利用多代理强化学习中的联合行动嵌入来实现合作游戏","authors":"Xingzhou Lou;Junge Zhang;Yali Du;Chao Yu;Zhaofeng He;Kaiqi Huang","doi":"10.1109/TG.2023.3302694","DOIUrl":null,"url":null,"abstract":"State-of-the-art multiagent policy gradient (MAPG) methods have demonstrated convincing capability in many cooperative games. However, the exponentially growing joint-action space severely challenges the critic's value evaluation and hinders performance of MAPG methods. To address this issue, we augment Central-Q policy gradient with a joint-action embedding function and propose mutual-information maximization MAPG (M3APG). The joint-action embedding function makes joint-actions contain information of state transitions, which will improve the critic's generalization over the joint-action space by allowing it to infer joint-actions' outcomes. We theoretically prove that with a fixed joint-action embedding function, the convergence of M3APG is guaranteed. Experiment results of the \n<italic>StarCraft</i>\n multiagent challenge (SMAC) demonstrate that M3APG gives evaluation results with better accuracy and outperform other MAPG basic models across various maps of multiple difficulty levels. We empirically show that our joint-action embedding model can be extended to value-based multiagent reinforcement learning methods and state-of-the-art MAPG methods. Finally, we run an ablation study to show that the usage of mutual information in our method is necessary and effective.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"16 2","pages":"470-482"},"PeriodicalIF":1.7000,"publicationDate":"2023-08-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Leveraging Joint-Action Embedding in Multiagent Reinforcement Learning for Cooperative Games\",\"authors\":\"Xingzhou Lou;Junge Zhang;Yali Du;Chao Yu;Zhaofeng He;Kaiqi Huang\",\"doi\":\"10.1109/TG.2023.3302694\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"State-of-the-art multiagent policy gradient (MAPG) methods have demonstrated convincing capability in many cooperative games. However, the exponentially growing joint-action space severely challenges the critic's value evaluation and hinders performance of MAPG methods. To address this issue, we augment Central-Q policy gradient with a joint-action embedding function and propose mutual-information maximization MAPG (M3APG). The joint-action embedding function makes joint-actions contain information of state transitions, which will improve the critic's generalization over the joint-action space by allowing it to infer joint-actions' outcomes. We theoretically prove that with a fixed joint-action embedding function, the convergence of M3APG is guaranteed. Experiment results of the \\n<italic>StarCraft</i>\\n multiagent challenge (SMAC) demonstrate that M3APG gives evaluation results with better accuracy and outperform other MAPG basic models across various maps of multiple difficulty levels. We empirically show that our joint-action embedding model can be extended to value-based multiagent reinforcement learning methods and state-of-the-art MAPG methods. Finally, we run an ablation study to show that the usage of mutual information in our method is necessary and effective.\",\"PeriodicalId\":55977,\"journal\":{\"name\":\"IEEE Transactions on Games\",\"volume\":\"16 2\",\"pages\":\"470-482\"},\"PeriodicalIF\":1.7000,\"publicationDate\":\"2023-08-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Games\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10210002/\",\"RegionNum\":4,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Games","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10210002/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Leveraging Joint-Action Embedding in Multiagent Reinforcement Learning for Cooperative Games
State-of-the-art multiagent policy gradient (MAPG) methods have demonstrated convincing capability in many cooperative games. However, the exponentially growing joint-action space severely challenges the critic's value evaluation and hinders performance of MAPG methods. To address this issue, we augment Central-Q policy gradient with a joint-action embedding function and propose mutual-information maximization MAPG (M3APG). The joint-action embedding function makes joint-actions contain information of state transitions, which will improve the critic's generalization over the joint-action space by allowing it to infer joint-actions' outcomes. We theoretically prove that with a fixed joint-action embedding function, the convergence of M3APG is guaranteed. Experiment results of the
StarCraft
multiagent challenge (SMAC) demonstrate that M3APG gives evaluation results with better accuracy and outperform other MAPG basic models across various maps of multiple difficulty levels. We empirically show that our joint-action embedding model can be extended to value-based multiagent reinforcement learning methods and state-of-the-art MAPG methods. Finally, we run an ablation study to show that the usage of mutual information in our method is necessary and effective.