Applying opponent and environment modelling in decentralised multi-agent reinforcement learning

IF 2.1 3区心理学 Q3 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Cognitive Systems Research Pub Date : 2025-01-01 DOI:10.1016/j.cogsys.2024.101306

Alexander Chernyavskiy , Alexey Skrynnik , Aleksandr Panov

{"title":"Applying opponent and environment modelling in decentralised multi-agent reinforcement learning","authors":"Alexander Chernyavskiy , Alexey Skrynnik , Aleksandr Panov","doi":"10.1016/j.cogsys.2024.101306","DOIUrl":null,"url":null,"abstract":"<div><div>Multi-agent reinforcement learning (MARL) has recently gained popularity and achieved much success in different kind of games such as zero-sum, cooperative or general-sum games. Nevertheless, the vast majority of modern algorithms assume information sharing during training and, hence, could not be utilised in decentralised applications as well as leverage high-dimensional scenarios and be applied to applications with general or sophisticated reward structure. Thus, due to collecting expenses and sparsity of data in real-world applications it becomes necessary to use world models to model the environment dynamics, using latent variables — i.e. use world model to generate synthetic data for training of MARL algorithms. Therefore, focusing on the paradigm of decentralised training and decentralised execution, we propose an extension to the model-based reinforcement learning approaches leveraging fully decentralised training with planning conditioned on neighbouring co-players’ latent representations. Our approach is inspired by the idea of opponent modelling. The method makes the agent learn in joint latent space without need to interact with the environment. We suggest the approach as proof of concept that decentralised model-based algorithms are able to emerge collective behaviour with limited communication during planning, and demonstrate its necessity on iterated matrix games and modified versions of StarCraft Multi-Agent Challenge (SMAC).</div></div>","PeriodicalId":55242,"journal":{"name":"Cognitive Systems Research","volume":"89 ","pages":"Article 101306"},"PeriodicalIF":2.1000,"publicationDate":"2025-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cognitive Systems Research","FirstCategoryId":"102","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389041724001001","RegionNum":3,"RegionCategory":"心理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Multi-agent reinforcement learning (MARL) has recently gained popularity and achieved much success in different kind of games such as zero-sum, cooperative or general-sum games. Nevertheless, the vast majority of modern algorithms assume information sharing during training and, hence, could not be utilised in decentralised applications as well as leverage high-dimensional scenarios and be applied to applications with general or sophisticated reward structure. Thus, due to collecting expenses and sparsity of data in real-world applications it becomes necessary to use world models to model the environment dynamics, using latent variables — i.e. use world model to generate synthetic data for training of MARL algorithms. Therefore, focusing on the paradigm of decentralised training and decentralised execution, we propose an extension to the model-based reinforcement learning approaches leveraging fully decentralised training with planning conditioned on neighbouring co-players’ latent representations. Our approach is inspired by the idea of opponent modelling. The method makes the agent learn in joint latent space without need to interact with the environment. We suggest the approach as proof of concept that decentralised model-based algorithms are able to emerge collective behaviour with limited communication during planning, and demonstrate its necessity on iterated matrix games and modified versions of StarCraft Multi-Agent Challenge (SMAC).

查看原文本刊更多论文

对手和环境建模在分散多智能体强化学习中的应用

近年来，多智能体强化学习（MARL）在零和博弈、合作博弈、一般和博弈等不同类型的博弈中得到了广泛的应用，并取得了很大的成功。然而，绝大多数现代算法在训练过程中假设信息共享，因此无法在分散应用程序中使用，也无法利用高维场景，也无法应用于具有一般或复杂奖励结构的应用程序。因此，由于实际应用中数据的收集费用和稀疏性，有必要使用世界模型来模拟环境动态，使用潜在变量-即使用世界模型生成用于训练MARL算法的合成数据。因此，专注于分散训练和分散执行的范式，我们提出了基于模型的强化学习方法的扩展，利用完全分散的训练，并以邻近合作参与者的潜在表征为条件进行规划。我们的方法受到对手建模思想的启发。该方法使智能体在联合潜在空间中学习，而不需要与环境交互。我们建议将这种方法作为概念的证明，即分散的基于模型的算法能够在计划期间出现有限沟通的集体行为，并在迭代矩阵游戏和修改版本的星际争霸多代理挑战（SMAC）中证明其必要性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Cognitive Systems Research 工程技术-计算机：人工智能

CiteScore

9.40

自引率

5.10%

发文量

审稿时长

>12 weeks

期刊介绍： Cognitive Systems Research is dedicated to the study of human-level cognition. As such, it welcomes papers which advance the understanding, design and applications of cognitive and intelligent systems, both natural and artificial. The journal brings together a broad community studying cognition in its many facets in vivo and in silico, across the developmental spectrum, focusing on individual capacities or on entire architectures. It aims to foster debate and integrate ideas, concepts, constructs, theories, models and techniques from across different disciplines and different perspectives on human-level cognition. The scope of interest includes the study of cognitive capacities and architectures - both brain-inspired and non-brain-inspired - and the application of cognitive systems to real-world problems as far as it offers insights relevant for the understanding of cognition. Cognitive Systems Research therefore welcomes mature and cutting-edge research approaching cognition from a systems-oriented perspective, both theoretical and empirically-informed, in the form of original manuscripts, short communications, opinion articles, systematic reviews, and topical survey articles from the fields of Cognitive Science (including Philosophy of Cognitive Science), Artificial Intelligence/Computer Science, Cognitive Robotics, Developmental Science, Psychology, and Neuroscience and Neuromorphic Engineering. Empirical studies will be considered if they are supplemented by theoretical analyses and contributions to theory development and/or computational modelling studies.