Zhitong Zhao , Ya Zhang , Wenyu Chen , Fan Zhang , Siying Wang , Yang Zhou
{"title":"Sequence value decomposition transformer for cooperative multi-agent reinforcement learning","authors":"Zhitong Zhao , Ya Zhang , Wenyu Chen , Fan Zhang , Siying Wang , Yang Zhou","doi":"10.1016/j.ins.2025.122514","DOIUrl":null,"url":null,"abstract":"<div><div>Existing multi-agent reinforcement learning (MARL) methods that utilize the centralized training with decentralized execution (CTDE) paradigm have achieved great empirical success in cooperative tasks. However, the CTDE paradigm struggles to capture the unequal interactions of agents by evaluating the joint actions simultaneously. In this paper, we introduce the concept of action sequences, which consider the unequal interactions among agents from multiple perspectives through different action orderings. Subsequently, we propose the multi-agent sequence value decomposition, allowing for a more comprehensive estimation of the joint q-value function through action sequences. Building on this, we construct a value decomposition transformer (VDT) framework to implement the multi-agent sequence value decomposition within the CTDE paradigm. By utilizing the transformer network, the VDT framework completes the centralized training with action sequences, resulting in enhancing cooperation capability in coordinated learning. Extensive experiments on the predator-prey task and the StarCraft multi-agent challenge demonstrate that our proposed VDT framework achieves significantly improved learning speed and cooperative performance. Compared to the state-of-the-art methods, VDT exhibits significant improvement in learning efficiency within the same timesteps and achieves an average 20% enhancement within the final cooperative performance.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"720 ","pages":"Article 122514"},"PeriodicalIF":8.1000,"publicationDate":"2025-07-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525006462","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Existing multi-agent reinforcement learning (MARL) methods that utilize the centralized training with decentralized execution (CTDE) paradigm have achieved great empirical success in cooperative tasks. However, the CTDE paradigm struggles to capture the unequal interactions of agents by evaluating the joint actions simultaneously. In this paper, we introduce the concept of action sequences, which consider the unequal interactions among agents from multiple perspectives through different action orderings. Subsequently, we propose the multi-agent sequence value decomposition, allowing for a more comprehensive estimation of the joint q-value function through action sequences. Building on this, we construct a value decomposition transformer (VDT) framework to implement the multi-agent sequence value decomposition within the CTDE paradigm. By utilizing the transformer network, the VDT framework completes the centralized training with action sequences, resulting in enhancing cooperation capability in coordinated learning. Extensive experiments on the predator-prey task and the StarCraft multi-agent challenge demonstrate that our proposed VDT framework achieves significantly improved learning speed and cooperative performance. Compared to the state-of-the-art methods, VDT exhibits significant improvement in learning efficiency within the same timesteps and achieves an average 20% enhancement within the final cooperative performance.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.