Low variance trust region optimization with independent actors and sequential updates in cooperative multi-agent reinforcement learning

IF 2.6 3区计算机科学 Q3 AUTOMATION & CONTROL SYSTEMS

Autonomous Agents and Multi-Agent Systems Pub Date : 2025-02-20 DOI:10.1007/s10458-025-09695-8

Bang Giang Le, Viet Cuong Ta

{"title":"Low variance trust region optimization with independent actors and sequential updates in cooperative multi-agent reinforcement learning","authors":"Bang Giang Le, Viet Cuong Ta","doi":"10.1007/s10458-025-09695-8","DOIUrl":null,"url":null,"abstract":"<div><p>Cooperative multi-agent reinforcement learning assumes each agent shares the same reward function and can be trained effectively using the Trust Region framework of single-agent. Instead of relying on other agents’ actions, the independent actors setting considers each agent to act based only on its local information, thus having more flexible applications. However, in the sequential update framework, it is required to re-estimate the joint advantage function after each individual agent’s policy step. Despite the practical success of importance sampling, the updated advantage function suffers from exponentially high variance problems, which likely results in unstable convergence. In this work, we first analyze the high variance advantage both empirically and theoretically. To overcome this limitation, we introduce a clipping objective to control the upper bounds of the advantage fluctuation in sequential updates. With the proposed objective, we provide a monotonic bound with sub-linear convergence to <span>\\(\\varepsilon\\)</span>-Nash Equilibria. We further derive two new practical algorithms using our clipping objective. The experiment results on three popular multi-agent reinforcement learning benchmarks show that our proposed method outperforms the tested baselines in most environments. By carefully analyzing different training settings, our proposed method is highlighted with both stable convergence properties and the desired low advantage variance estimation. For reproducibility purposes, our source code is publicly available at https://github.com/giangbang/Low-Variance-Trust-Region-MARL.</p></div>","PeriodicalId":55586,"journal":{"name":"Autonomous Agents and Multi-Agent Systems","volume":"39 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2025-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://link.springer.com/content/pdf/10.1007/s10458-025-09695-8.pdf","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Autonomous Agents and Multi-Agent Systems","FirstCategoryId":"94","ListUrlMain":"https://link.springer.com/article/10.1007/s10458-025-09695-8","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Cooperative multi-agent reinforcement learning assumes each agent shares the same reward function and can be trained effectively using the Trust Region framework of single-agent. Instead of relying on other agents’ actions, the independent actors setting considers each agent to act based only on its local information, thus having more flexible applications. However, in the sequential update framework, it is required to re-estimate the joint advantage function after each individual agent’s policy step. Despite the practical success of importance sampling, the updated advantage function suffers from exponentially high variance problems, which likely results in unstable convergence. In this work, we first analyze the high variance advantage both empirically and theoretically. To overcome this limitation, we introduce a clipping objective to control the upper bounds of the advantage fluctuation in sequential updates. With the proposed objective, we provide a monotonic bound with sub-linear convergence to \(\varepsilon\)-Nash Equilibria. We further derive two new practical algorithms using our clipping objective. The experiment results on three popular multi-agent reinforcement learning benchmarks show that our proposed method outperforms the tested baselines in most environments. By carefully analyzing different training settings, our proposed method is highlighted with both stable convergence properties and the desired low advantage variance estimation. For reproducibility purposes, our source code is publicly available at https://github.com/giangbang/Low-Variance-Trust-Region-MARL.

查看原文本刊更多论文

协同多智能体强化学习中独立行为体和顺序更新的低方差信任域优化

协作式多智能体强化学习假设每个智能体共享相同的奖励函数，并且可以使用单智能体的信任域框架进行有效的训练。独立参与者设置不依赖于其他代理的行为，而是考虑每个代理仅根据其本地信息进行行为，从而具有更灵活的应用程序。然而，在顺序更新框架中，需要在每个个体agent的策略步骤之后重新估计联合优势函数。尽管重要性抽样在实践中取得了成功，但更新后的优势函数存在指数级高方差问题，这可能导致不稳定的收敛。本文首先从实证和理论两方面分析了高方差优势。为了克服这一限制，我们引入了一个裁剪目标来控制序列更新中优势波动的上界。利用所提出的目标，我们提供了一个次线性收敛的单调界到\(\varepsilon\) -纳什均衡。利用我们的裁剪目标，我们进一步推导出两种新的实用算法。在三个流行的多智能体强化学习基准上的实验结果表明，我们提出的方法在大多数环境下都优于测试基线。通过仔细分析不同的训练设置，我们提出的方法具有稳定的收敛特性和期望的低优势方差估计。出于可再现性的考虑，我们的源代码可以在https://github.com/giangbang/Low-Variance-Trust-Region-MARL上公开获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Autonomous Agents and Multi-Agent Systems 工程技术-计算机：人工智能

CiteScore

6.00

自引率

5.30%

发文量

审稿时长

>12 weeks

期刊介绍： This is the official journal of the International Foundation for Autonomous Agents and Multi-Agent Systems. It provides a leading forum for disseminating significant original research results in the foundations, theory, development, analysis, and applications of autonomous agents and multi-agent systems. Coverage in Autonomous Agents and Multi-Agent Systems includes, but is not limited to: Agent decision-making architectures and their evaluation, including: cognitive models; knowledge representation; logics for agency; ontological reasoning; planning (single and multi-agent); reasoning (single and multi-agent) Cooperation and teamwork, including: distributed problem solving; human-robot/agent interaction; multi-user/multi-virtual-agent interaction; coalition formation; coordination Agent communication languages, including: their semantics, pragmatics, and implementation; agent communication protocols and conversations; agent commitments; speech act theory Ontologies for agent systems, agents and the semantic web, agents and semantic web services, Grid-based systems, and service-oriented computing Agent societies and societal issues, including: artificial social systems; environments, organizations and institutions; ethical and legal issues; privacy, safety and security; trust, reliability and reputation Agent-based system development, including: agent development techniques, tools and environments; agent programming languages; agent specification or validation languages Agent-based simulation, including: emergent behavior; participatory simulation; simulation techniques, tools and environments; social simulation Agreement technologies, including: argumentation; collective decision making; judgment aggregation and belief merging; negotiation; norms Economic paradigms, including: auction and mechanism design; bargaining and negotiation; economically-motivated agents; game theory (cooperative and non-cooperative); social choice and voting Learning agents, including: computational architectures for learning agents; evolution, adaptation; multi-agent learning. Robotic agents, including: integrated perception, cognition, and action; cognitive robotics; robot planning (including action and motion planning); multi-robot systems. Virtual agents, including: agents in games and virtual environments; companion and coaching agents; modeling personality, emotions; multimodal interaction; verbal and non-verbal expressiveness Significant, novel applications of agent technology Comprehensive reviews and authoritative tutorials of research and practice in agent systems Comprehensive and authoritative reviews of books dealing with agents and multi-agent systems.