Cooperative Multiagent Learning and Exploration With Min–Max Intrinsic Motivation

IF 9.4 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

IEEE Transactions on Cybernetics Pub Date : 2025-04-18 DOI:10.1109/TCYB.2025.3557694

Yaqing Hou;Jie Kang;Haiyin Piao;Yifeng Zeng;Yew-Soon Ong;Yaochu Jin;Qiang Zhang

{"title":"Cooperative Multiagent Learning and Exploration With Min–Max Intrinsic Motivation","authors":"Yaqing Hou;Jie Kang;Haiyin Piao;Yifeng Zeng;Yew-Soon Ong;Yaochu Jin;Qiang Zhang","doi":"10.1109/TCYB.2025.3557694","DOIUrl":null,"url":null,"abstract":"In the field of multiagent reinforcement learning (MARL), the ability to effectively explore unknown environments and collect information and experiences that are most beneficial for policy learning represents a critical research area. However, existing work often encounters difficulties in addressing the uncertainties caused by state changes and the inconsistencies between agents’ local observations and global information, which presents significant challenges to coordinated exploration among multiple agents. To address this issue, this article proposes a novel MARL exploration method with Min-Max intrinsic motivation (E2M) that promotes the learning of joint policies of agents by introducing surprise minimization and social influence maximization. Since the agent is subject to unstable state changes in the environment, we introduce surprise minimization by computing state entropy to encourage the agents to cope with more stable and familiar situations. This method enables surprise estimation based on the low-dimensional representation of states obtained from random encoders. Furthermore, to prevent surprise minimization from leading to conservative policies, we introduce mutual information between agents’ behaviors as social influence. By maximizing social influence, the agents are encouraged to interact to facilitate the emergence of cooperative behavior. The performance of our proposed E2M is testified across a range of popular StarCraft II and Multiagent MuJoCo tasks. Comprehensive results demonstrate its effectiveness in enhancing the cooperative capability of the multiple agents.","PeriodicalId":13112,"journal":{"name":"IEEE Transactions on Cybernetics","volume":"55 6","pages":"2852-2864"},"PeriodicalIF":9.4000,"publicationDate":"2025-04-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Cybernetics","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10970246/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

In the field of multiagent reinforcement learning (MARL), the ability to effectively explore unknown environments and collect information and experiences that are most beneficial for policy learning represents a critical research area. However, existing work often encounters difficulties in addressing the uncertainties caused by state changes and the inconsistencies between agents’ local observations and global information, which presents significant challenges to coordinated exploration among multiple agents. To address this issue, this article proposes a novel MARL exploration method with Min-Max intrinsic motivation (E2M) that promotes the learning of joint policies of agents by introducing surprise minimization and social influence maximization. Since the agent is subject to unstable state changes in the environment, we introduce surprise minimization by computing state entropy to encourage the agents to cope with more stable and familiar situations. This method enables surprise estimation based on the low-dimensional representation of states obtained from random encoders. Furthermore, to prevent surprise minimization from leading to conservative policies, we introduce mutual information between agents’ behaviors as social influence. By maximizing social influence, the agents are encouraged to interact to facilitate the emergence of cooperative behavior. The performance of our proposed E2M is testified across a range of popular StarCraft II and Multiagent MuJoCo tasks. Comprehensive results demonstrate its effectiveness in enhancing the cooperative capability of the multiple agents.

查看原文本刊更多论文

最小-最大内在动机下的合作多智能体学习与探索

在多智能体强化学习（MARL）领域，有效探索未知环境和收集最有利于策略学习的信息和经验的能力是一个关键的研究领域。然而，现有的工作往往难以解决状态变化带来的不确定性以及智能体局部观测与全局信息之间的不一致性，这对多智能体之间的协调探索提出了重大挑战。为了解决这一问题，本文提出了一种新的具有最小-最大内在动机（E2M）的MARL探索方法，该方法通过引入惊喜最小化和社会影响最大化来促进智能体联合策略的学习。由于智能体在环境中会受到不稳定状态变化的影响，我们通过计算状态熵引入惊喜最小化，以鼓励智能体应对更稳定和熟悉的情况。这种方法能够基于从随机编码器获得的状态的低维表示进行惊奇估计。此外，为了防止意外最小化导致保守策略，我们引入了代理行为之间的互信息作为社会影响。通过最大化社会影响，鼓励代理人进行互动，以促进合作行为的出现。我们提出的E2M的性能在一系列流行的星际争霸II和Multiagent MuJoCo任务中得到了证明。综合结果表明，该方法对提高多智能体的协同能力是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Cybernetics COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE-COMPUTER SCIENCE, CYBERNETICS

CiteScore

25.40

自引率

11.00%

发文量

1869

期刊介绍： The scope of the IEEE Transactions on Cybernetics includes computational approaches to the field of cybernetics. Specifically, the transactions welcomes papers on communication and control across machines or machine, human, and organizations. The scope includes such areas as computational intelligence, computer vision, neural networks, genetic algorithms, machine learning, fuzzy systems, cognitive systems, decision making, and robotics, to the extent that they contribute to the theme of cybernetics or demonstrate an application of cybernetics principles.