Enhancing multi-agent reinforcement learning via world model assisted single-agent population policies in multi-UAV cooperative-competitive scenario

IF 7.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Knowledge-Based Systems Pub Date : 2025-10-14 DOI:10.1016/j.knosys.2025.114534

Jiaming Cheng , Ni Li , Changyin Dong , Chong Tang

{"title":"Enhancing multi-agent reinforcement learning via world model assisted single-agent population policies in multi-UAV cooperative-competitive scenario","authors":"Jiaming Cheng , Ni Li , Changyin Dong , Chong Tang","doi":"10.1016/j.knosys.2025.114534","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-agent reinforcement learning (MARL) with limited search is prone to becoming trapped in local optima and struggles to adapt to opponents’ changing intentions and strategies in cooperative-competitive environments. Evolutionary strategies (ES) are a promising alternative that have been applied to RL owing to their diverse exploration characteristics to form a hybrid framework. However, most such methods require population policies to interact within the environment for Monte Carlo (MC) evaluation or to filter experience using Q-functions, leading to low sample efficiency. Additionally, existing methods struggle to maintain policy quality and avoid catastrophic forgetting while exploring. Our objective is to enhance MARL in mixed environments through single-agent population policies to improve the sample efficiency and mitigate the aforementioned issues with ES. To achieve this, we propose a world-model-assisted cross entropy method (CEM)-MARL approach. This world model enables inferring opponents’ mental states and predicting and evaluating future trajectories. CEM is used to update population policies and retain the top K/2 old policies. Subsequently, predicted future experiences are reused for updating the MARL policy and evaluated to perform quality-assured mutations on the top K/2 population policies. Furthermore, the elite policy guides the MARL, and the MARL policy guides the population. Experimental results in multiple unmanned aerial vehicle (multi-UAV) game scenarios show that our method accelerates learning by 40 times compared to model-free MARL and nearly 10 times compared to our version without ES. This increases the learning efficiency and enhances the performance of the hybrid framework that incorporates collaborative learning and evolutionary processes.</div></div>","PeriodicalId":49939,"journal":{"name":"Knowledge-Based Systems","volume":"330 ","pages":"Article 114534"},"PeriodicalIF":7.6000,"publicationDate":"2025-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Knowledge-Based Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0950705125015734","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-agent reinforcement learning (MARL) with limited search is prone to becoming trapped in local optima and struggles to adapt to opponents’ changing intentions and strategies in cooperative-competitive environments. Evolutionary strategies (ES) are a promising alternative that have been applied to RL owing to their diverse exploration characteristics to form a hybrid framework. However, most such methods require population policies to interact within the environment for Monte Carlo (MC) evaluation or to filter experience using Q-functions, leading to low sample efficiency. Additionally, existing methods struggle to maintain policy quality and avoid catastrophic forgetting while exploring. Our objective is to enhance MARL in mixed environments through single-agent population policies to improve the sample efficiency and mitigate the aforementioned issues with ES. To achieve this, we propose a world-model-assisted cross entropy method (CEM)-MARL approach. This world model enables inferring opponents’ mental states and predicting and evaluating future trajectories. CEM is used to update population policies and retain the top K/2 old policies. Subsequently, predicted future experiences are reused for updating the MARL policy and evaluated to perform quality-assured mutations on the top K/2 population policies. Furthermore, the elite policy guides the MARL, and the MARL policy guides the population. Experimental results in multiple unmanned aerial vehicle (multi-UAV) game scenarios show that our method accelerates learning by 40 times compared to model-free MARL and nearly 10 times compared to our version without ES. This increases the learning efficiency and enhances the performance of the hybrid framework that incorporates collaborative learning and evolutionary processes.

查看原文本刊更多论文

基于世界模型辅助单智能体种群策略的多无人机合作竞争场景下多智能体强化学习

有限搜索的多智能体强化学习（MARL）在合作-竞争环境中容易陷入局部最优，难以适应对手不断变化的意图和策略。进化策略（Evolutionary strategies， ES）是一种应用于强化学习的有前途的替代方法，因为它具有不同的探索特征，可以形成一个混合框架。然而，大多数此类方法需要种群策略在环境中进行蒙特卡罗（MC）评估或使用q函数过滤经验，从而导致低样本效率。此外，现有方法难以保持策略质量，并在探索过程中避免灾难性遗忘。我们的目标是通过单智能体总体策略增强混合环境中的MARL，以提高样本效率并缓解ES的上述问题。为了实现这一点，我们提出了一种世界模型辅助交叉熵方法(CEM)-MARL方法。这个世界模型可以推断对手的心理状态，预测和评估未来的轨迹。CEM用于更新人口政策，并保留前K/2个旧政策。随后，预测的未来经验被重用以更新MARL策略，并对其进行评估，以对最高K/2个种群策略执行有质量保证的突变。此外，精英政策引导MARL， MARL政策引导大众。在多无人机（multi-UAV）游戏场景下的实验结果表明，我们的方法与无模型的MARL相比，学习速度提高了40倍，与没有ES的版本相比，学习速度提高了近10倍。这提高了学习效率，并增强了融合了协作学习和进化过程的混合框架的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Knowledge-Based Systems 工程技术-计算机：人工智能

CiteScore

14.80

自引率

12.50%

发文量

1245

审稿时长

7.8 months

期刊介绍： Knowledge-Based Systems, an international and interdisciplinary journal in artificial intelligence, publishes original, innovative, and creative research results in the field. It focuses on knowledge-based and other artificial intelligence techniques-based systems. The journal aims to support human prediction and decision-making through data science and computation techniques, provide a balanced coverage of theory and practical study, and encourage the development and implementation of knowledge-based intelligence models, methods, systems, and software tools. Applications in business, government, education, engineering, and healthcare are emphasized.