{"title":"Learning Individual Potential-Based Rewards in Multiagent Reinforcement Learning","authors":"Chen Yang;Pei Xu;Junge Zhang","doi":"10.1109/TG.2024.3450475","DOIUrl":null,"url":null,"abstract":"A great challenge for applying multiagent reinforcement learning (MARL) in the field of game artificial intelligence (AI) is to enable agents to learn diversified policies to handle different game-specific problems, while receiving only a shared team reward. At present, a common approach is reward shaping, which focuses on designing rewards for agents to guide cooperation. However, most of the existing methods require prior knowledge on the environment for reward design or alter the optimal policies after imposing extra rewards. Besides, previous MARL methods that rely on manually designed rewards can hardly generalize across different game environments. To this end, we propose a new MARL method that learns individual potential-based rewards for agents. Specifically, we learn a parameterized potential function for each agent to generate individual rewards in the discounted temporal difference form. The whole update procedure is modeled as the bilevel optimization problem, where the lower level is to optimize policies with potential-based rewards, and the upper level is to optimize parameterized potential functions toward maximizing the environment return. We theoretically prove that the individual potential-based rewards can guarantee policy invariance for agents, so that the optimization objective is consistent with the original MARL problem. We evaluate our method with a number of existing state-of-the-art MARL methods on predator–prey and <italic>StarCraft II</i> game environments. Empirical results show that our proposed method significantly outperforms baseline methods and achieves better game AI that enjoys high performance and generalization.","PeriodicalId":55977,"journal":{"name":"IEEE Transactions on Games","volume":"17 2","pages":"334-345"},"PeriodicalIF":2.8000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Games","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10659352/","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
A great challenge for applying multiagent reinforcement learning (MARL) in the field of game artificial intelligence (AI) is to enable agents to learn diversified policies to handle different game-specific problems, while receiving only a shared team reward. At present, a common approach is reward shaping, which focuses on designing rewards for agents to guide cooperation. However, most of the existing methods require prior knowledge on the environment for reward design or alter the optimal policies after imposing extra rewards. Besides, previous MARL methods that rely on manually designed rewards can hardly generalize across different game environments. To this end, we propose a new MARL method that learns individual potential-based rewards for agents. Specifically, we learn a parameterized potential function for each agent to generate individual rewards in the discounted temporal difference form. The whole update procedure is modeled as the bilevel optimization problem, where the lower level is to optimize policies with potential-based rewards, and the upper level is to optimize parameterized potential functions toward maximizing the environment return. We theoretically prove that the individual potential-based rewards can guarantee policy invariance for agents, so that the optimization objective is consistent with the original MARL problem. We evaluate our method with a number of existing state-of-the-art MARL methods on predator–prey and StarCraft II game environments. Empirical results show that our proposed method significantly outperforms baseline methods and achieves better game AI that enjoys high performance and generalization.