Kunyang Lin,Yufeng Wang,Peihao Chen,Runhao Zeng,Yinjie Lei,Siyuan Zhou,Qing Du,Mingkui Tan,Chuang Gan
{"title":"When to Align: Dynamic Behavior Consistency for Multiagent Systems via Intrinsic Rewards.","authors":"Kunyang Lin,Yufeng Wang,Peihao Chen,Runhao Zeng,Yinjie Lei,Siyuan Zhou,Qing Du,Mingkui Tan,Chuang Gan","doi":"10.1109/tnnls.2025.3598301","DOIUrl":null,"url":null,"abstract":"In multiagent systems, learning optimal behavior policies for individual agents remains a challenging yet crucial task. While recent research has made strides in this area, the issue of when agents should maintain consistent behaviors with one another is still not adequately addressed. This article proposes a novel approach to enable agents to autonomously decide whether their behaviors should align with those of their peers by leveraging intrinsic rewards to optimize their policies. We define behavior consistency as the divergence between the actions taken by two agents given the same observations. To encourage agents to be aware of each other's behaviors, we propose dynamic consistency-based intrinsic reward (DCIR), which guides agents in determining when to synchronize their behaviors. In addition, we introduce a dynamic scaling network (DSN) that provides learnable scaling factors at each time step, enabling agents to dynamically decide the extent of rewarding consistent behavior. Our method is evaluated on environments including Multiagent Particle, Google Research Football, and StarCraft II Micromanagement. Experimental results demonstrate its effectiveness in learning optimal policies.","PeriodicalId":13303,"journal":{"name":"IEEE transactions on neural networks and learning systems","volume":"12 1","pages":""},"PeriodicalIF":8.9000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on neural networks and learning systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tnnls.2025.3598301","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
In multiagent systems, learning optimal behavior policies for individual agents remains a challenging yet crucial task. While recent research has made strides in this area, the issue of when agents should maintain consistent behaviors with one another is still not adequately addressed. This article proposes a novel approach to enable agents to autonomously decide whether their behaviors should align with those of their peers by leveraging intrinsic rewards to optimize their policies. We define behavior consistency as the divergence between the actions taken by two agents given the same observations. To encourage agents to be aware of each other's behaviors, we propose dynamic consistency-based intrinsic reward (DCIR), which guides agents in determining when to synchronize their behaviors. In addition, we introduce a dynamic scaling network (DSN) that provides learnable scaling factors at each time step, enabling agents to dynamically decide the extent of rewarding consistent behavior. Our method is evaluated on environments including Multiagent Particle, Google Research Football, and StarCraft II Micromanagement. Experimental results demonstrate its effectiveness in learning optimal policies.
期刊介绍:
The focus of IEEE Transactions on Neural Networks and Learning Systems is to present scholarly articles discussing the theory, design, and applications of neural networks as well as other learning systems. The journal primarily highlights technical and scientific research in this domain.