Jiaming Cheng , Ni Li , Changyin Dong , Chong Tang
{"title":"Enhancing multi-agent reinforcement learning via world model assisted single-agent population policies in multi-UAV cooperative-competitive scenario","authors":"Jiaming Cheng , Ni Li , Changyin Dong , Chong Tang","doi":"10.1016/j.knosys.2025.114534","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-agent reinforcement learning (MARL) with limited search is prone to becoming trapped in local optima and struggles to adapt to opponents’ changing intentions and strategies in cooperative-competitive environments. Evolutionary strategies (ES) are a promising alternative that have been applied to RL owing to their diverse exploration characteristics to form a hybrid framework. However, most such methods require population policies to interact within the environment for Monte Carlo (MC) evaluation or to filter experience using Q-functions, leading to low sample efficiency. Additionally, existing methods struggle to maintain policy quality and avoid catastrophic forgetting while exploring. Our objective is to enhance MARL in mixed environments through single-agent population policies to improve the sample efficiency and mitigate the aforementioned issues with ES. To achieve this, we propose a world-model-assisted cross entropy method (CEM)-MARL approach. This world model enables inferring opponents’ mental states and predicting and evaluating future trajectories. CEM is used to update population policies and retain the top K/2 old policies. Subsequently, predicted future experiences are reused for updating the MARL policy and evaluated to perform quality-assured mutations on the top K/2 population policies. Furthermore, the elite policy guides the MARL, and the MARL policy guides the population. Experimental results in multiple unmanned aerial vehicle (multi-UAV) game scenarios show that our method accelerates learning by 40 times compared to model-free MARL and nearly 10 times compared to our version without ES. This increases the learning efficiency and enhances the performance of the hybrid framework that incorporates collaborative learning and evolutionary processes.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114534"},"PeriodicalIF":7.6000,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015734","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Multi-agent reinforcement learning (MARL) with limited search is prone to becoming trapped in local optima and struggles to adapt to opponents’ changing intentions and strategies in cooperative-competitive environments. Evolutionary strategies (ES) are a promising alternative that have been applied to RL owing to their diverse exploration characteristics to form a hybrid framework. However, most such methods require population policies to interact within the environment for Monte Carlo (MC) evaluation or to filter experience using Q-functions, leading to low sample efficiency. Additionally, existing methods struggle to maintain policy quality and avoid catastrophic forgetting while exploring. Our objective is to enhance MARL in mixed environments through single-agent population policies to improve the sample efficiency and mitigate the aforementioned issues with ES. To achieve this, we propose a world-model-assisted cross entropy method (CEM)-MARL approach. This world model enables inferring opponents’ mental states and predicting and evaluating future trajectories. CEM is used to update population policies and retain the top K/2 old policies. Subsequently, predicted future experiences are reused for updating the MARL policy and evaluated to perform quality-assured mutations on the top K/2 population policies. Furthermore, the elite policy guides the MARL, and the MARL policy guides the population. Experimental results in multiple unmanned aerial vehicle (multi-UAV) game scenarios show that our method accelerates learning by 40 times compared to model-free MARL and nearly 10 times compared to our version without ES. This increases the learning efficiency and enhances the performance of the hybrid framework that incorporates collaborative learning and evolutionary processes.
期刊介绍:
Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.