{"title":"Cooperative Multiagent Learning and Exploration With Min–Max Intrinsic Motivation","authors":"Yaqing Hou;Jie Kang;Haiyin Piao;Yifeng Zeng;Yew-Soon Ong;Yaochu Jin;Qiang Zhang","doi":"10.1109/TCYB.2025.3557694","DOIUrl":null,"url":null,"abstract":"In the field of multiagent reinforcement learning (MARL), the ability to effectively explore unknown environments and collect information and experiences that are most beneficial for policy learning represents a critical research area. However, existing work often encounters difficulties in addressing the uncertainties caused by state changes and the inconsistencies between agents’ local observations and global information, which presents significant challenges to coordinated exploration among multiple agents. To address this issue, this article proposes a novel MARL exploration method with Min-Max intrinsic motivation (E2M) that promotes the learning of joint policies of agents by introducing surprise minimization and social influence maximization. Since the agent is subject to unstable state changes in the environment, we introduce surprise minimization by computing state entropy to encourage the agents to cope with more stable and familiar situations. This method enables surprise estimation based on the low-dimensional representation of states obtained from random encoders. Furthermore, to prevent surprise minimization from leading to conservative policies, we introduce mutual information between agents’ behaviors as social influence. By maximizing social influence, the agents are encouraged to interact to facilitate the emergence of cooperative behavior. The performance of our proposed E2M is testified across a range of popular StarCraft II and Multiagent MuJoCo tasks. Comprehensive results demonstrate its effectiveness in enhancing the cooperative capability of the multiple agents.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 6","pages":"2852-2864"},"PeriodicalIF":9.4000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10970246/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
In the field of multiagent reinforcement learning (MARL), the ability to effectively explore unknown environments and collect information and experiences that are most beneficial for policy learning represents a critical research area. However, existing work often encounters difficulties in addressing the uncertainties caused by state changes and the inconsistencies between agents’ local observations and global information, which presents significant challenges to coordinated exploration among multiple agents. To address this issue, this article proposes a novel MARL exploration method with Min-Max intrinsic motivation (E2M) that promotes the learning of joint policies of agents by introducing surprise minimization and social influence maximization. Since the agent is subject to unstable state changes in the environment, we introduce surprise minimization by computing state entropy to encourage the agents to cope with more stable and familiar situations. This method enables surprise estimation based on the low-dimensional representation of states obtained from random encoders. Furthermore, to prevent surprise minimization from leading to conservative policies, we introduce mutual information between agents’ behaviors as social influence. By maximizing social influence, the agents are encouraged to interact to facilitate the emergence of cooperative behavior. The performance of our proposed E2M is testified across a range of popular StarCraft II and Multiagent MuJoCo tasks. Comprehensive results demonstrate its effectiveness in enhancing the cooperative capability of the multiple agents.
期刊介绍:
The scope of the IEEE Transactions on Cybernetics includes computational approaches to the field of cybernetics. Specifically, the transactions welcomes papers on communication and control across machines or machine, human, and organizations. The scope includes such areas as computational intelligence, computer vision, neural networks, genetic algorithms, machine learning, fuzzy systems, cognitive systems, decision making, and robotics, to the extent that they contribute to the theme of cybernetics or demonstrate an application of cybernetics principles.