Kai Liu , Tianxian Zhang , Xiangliang Xu , Yuyang Zhao
{"title":"Counterfactual value decomposition for cooperative multi-agent reinforcement learning","authors":"Kai Liu , Tianxian Zhang , Xiangliang Xu , Yuyang Zhao","doi":"10.1016/j.neunet.2025.107692","DOIUrl":null,"url":null,"abstract":"<div><div>Value decomposition has become a central focus in Multi-Agent Reinforcement Learning (MARL) in recent years. The key challenge lies in the construction and updating of the factored value function (FVF). Traditional methods rely on FVFs with restricted representational capacity, rendering them inadequate for tasks with non-monotonic payoffs. Recent approaches address this limitation by designing FVF update mechanisms that enable applicability to non-monotonic scenarios. However, these methods typically depend on the true optimal joint action value to guide FVF updates. Since the true optimal joint action is computationally infeasible in practice, these methods approximate it using the greedy joint action and update the FVF with the corresponding greedy joint action value. We observe that although the greedy joint action may be close to the true optimal joint action, its associated greedy joint action value can be substantially biased relative to the true optimal joint action value. This makes the approximation unreliable and can lead to incorrect update directions for the FVF, hindering the learning process. To overcome this limitation, we propose Comix, a novel off-policy MARL method based on a Sandwich Value Decomposition Framework. Comix constrains and guides FVF updates using both upper and lower bounds. Specifically, it leverages orthogonal best responses to construct the upper bound, thus overcoming the drawbacks introduced by the optimal approximation. Furthermore, an attention mechanism is incorporated to ensure that the upper bound can be computed with linear time complexity and high accuracy. Theoretical analyses show that Comix satisfies the IGM. Experiments on the asymmetric One-Step Matrix Game, discrete Predator-Prey, and StarCraft Multi-Agent Challenge show that Comix achieves higher learning efficiency and outperforms several state-of-the-art methods.</div></div>","PeriodicalId":49763,"journal":{"name":"Neural Networks","volume":"190 ","pages":"Article 107692"},"PeriodicalIF":6.0000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neural Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0893608025005726","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Value decomposition has become a central focus in Multi-Agent Reinforcement Learning (MARL) in recent years. The key challenge lies in the construction and updating of the factored value function (FVF). Traditional methods rely on FVFs with restricted representational capacity, rendering them inadequate for tasks with non-monotonic payoffs. Recent approaches address this limitation by designing FVF update mechanisms that enable applicability to non-monotonic scenarios. However, these methods typically depend on the true optimal joint action value to guide FVF updates. Since the true optimal joint action is computationally infeasible in practice, these methods approximate it using the greedy joint action and update the FVF with the corresponding greedy joint action value. We observe that although the greedy joint action may be close to the true optimal joint action, its associated greedy joint action value can be substantially biased relative to the true optimal joint action value. This makes the approximation unreliable and can lead to incorrect update directions for the FVF, hindering the learning process. To overcome this limitation, we propose Comix, a novel off-policy MARL method based on a Sandwich Value Decomposition Framework. Comix constrains and guides FVF updates using both upper and lower bounds. Specifically, it leverages orthogonal best responses to construct the upper bound, thus overcoming the drawbacks introduced by the optimal approximation. Furthermore, an attention mechanism is incorporated to ensure that the upper bound can be computed with linear time complexity and high accuracy. Theoretical analyses show that Comix satisfies the IGM. Experiments on the asymmetric One-Step Matrix Game, discrete Predator-Prey, and StarCraft Multi-Agent Challenge show that Comix achieves higher learning efficiency and outperforms several state-of-the-art methods.
期刊介绍:
Neural Networks is a platform that aims to foster an international community of scholars and practitioners interested in neural networks, deep learning, and other approaches to artificial intelligence and machine learning. Our journal invites submissions covering various aspects of neural networks research, from computational neuroscience and cognitive modeling to mathematical analyses and engineering applications. By providing a forum for interdisciplinary discussions between biology and technology, we aim to encourage the development of biologically-inspired artificial intelligence.