XP-MARL:多代理强化学习中的辅助优先级以解决非稳定性问题

Jianye Xu, Omar Sobhy, Bassam Alrifaee
{"title":"XP-MARL:多代理强化学习中的辅助优先级以解决非稳定性问题","authors":"Jianye Xu, Omar Sobhy, Bassam Alrifaee","doi":"arxiv-2409.11852","DOIUrl":null,"url":null,"abstract":"Non-stationarity poses a fundamental challenge in Multi-Agent Reinforcement\nLearning (MARL), arising from agents simultaneously learning and altering their\npolicies. This creates a non-stationary environment from the perspective of\neach individual agent, often leading to suboptimal or even unconverged learning\noutcomes. We propose an open-source framework named XP-MARL, which augments\nMARL with auxiliary prioritization to address this challenge in cooperative\nsettings. XP-MARL is 1) founded upon our hypothesis that prioritizing agents\nand letting higher-priority agents establish their actions first would\nstabilize the learning process and thus mitigate non-stationarity and 2)\nenabled by our proposed mechanism called action propagation, where\nhigher-priority agents act first and communicate their actions, providing a\nmore stationary environment for others. Moreover, instead of using a predefined\nor heuristic priority assignment, XP-MARL learns priority-assignment policies\nwith an auxiliary MARL problem, leading to a joint learning scheme. Experiments\nin a motion-planning scenario involving Connected and Automated Vehicles (CAVs)\ndemonstrate that XP-MARL improves the safety of a baseline model by 84.4% and\noutperforms a state-of-the-art approach, which improves the baseline by only\n12.8%. Code: github.com/cas-lab-munich/sigmarl","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"XP-MARL: Auxiliary Prioritization in Multi-Agent Reinforcement Learning to Address Non-Stationarity\",\"authors\":\"Jianye Xu, Omar Sobhy, Bassam Alrifaee\",\"doi\":\"arxiv-2409.11852\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Non-stationarity poses a fundamental challenge in Multi-Agent Reinforcement\\nLearning (MARL), arising from agents simultaneously learning and altering their\\npolicies. This creates a non-stationary environment from the perspective of\\neach individual agent, often leading to suboptimal or even unconverged learning\\noutcomes. We propose an open-source framework named XP-MARL, which augments\\nMARL with auxiliary prioritization to address this challenge in cooperative\\nsettings. XP-MARL is 1) founded upon our hypothesis that prioritizing agents\\nand letting higher-priority agents establish their actions first would\\nstabilize the learning process and thus mitigate non-stationarity and 2)\\nenabled by our proposed mechanism called action propagation, where\\nhigher-priority agents act first and communicate their actions, providing a\\nmore stationary environment for others. Moreover, instead of using a predefined\\nor heuristic priority assignment, XP-MARL learns priority-assignment policies\\nwith an auxiliary MARL problem, leading to a joint learning scheme. Experiments\\nin a motion-planning scenario involving Connected and Automated Vehicles (CAVs)\\ndemonstrate that XP-MARL improves the safety of a baseline model by 84.4% and\\noutperforms a state-of-the-art approach, which improves the baseline by only\\n12.8%. Code: github.com/cas-lab-munich/sigmarl\",\"PeriodicalId\":501031,\"journal\":{\"name\":\"arXiv - CS - Robotics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Robotics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11852\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

非稳态性是多代理强化学习(MARL)中的一个基本挑战,它产生于代理同时学习和改变其策略。从单个代理的角度来看,这创造了一个非稳态环境,往往会导致次优甚至不融合的学习结果。我们提出了一个名为 XP-MARL 的开源框架,该框架通过辅助优先级排序来增强 MARL,以应对合作环境中的这一挑战。XP-MARL 1)建立在我们的假设之上,即确定代理的优先级并让优先级较高的代理首先确定其行动将稳定学习过程,从而缓解非稳态问题;2)通过我们提出的行动传播机制得以实现,即优先级较高的代理首先行动并传播其行动,为其他代理提供更稳定的环境。此外,XP-MARL 不使用预定义或启发式优先级分配,而是通过一个辅助 MARL 问题来学习优先级分配策略,从而形成一种联合学习方案。在涉及车联网和自动驾驶汽车(CAV)的运动规划场景中进行的实验表明,XP-MARL 将基线模型的安全性提高了 84.4%,优于最先进的方法,后者仅将基线提高了 12.8%。代码:github.com/cas-lab-munich/sigmarl
本文章由计算机程序翻译,如有差异,请以英文原文为准。
XP-MARL: Auxiliary Prioritization in Multi-Agent Reinforcement Learning to Address Non-Stationarity
Non-stationarity poses a fundamental challenge in Multi-Agent Reinforcement Learning (MARL), arising from agents simultaneously learning and altering their policies. This creates a non-stationary environment from the perspective of each individual agent, often leading to suboptimal or even unconverged learning outcomes. We propose an open-source framework named XP-MARL, which augments MARL with auxiliary prioritization to address this challenge in cooperative settings. XP-MARL is 1) founded upon our hypothesis that prioritizing agents and letting higher-priority agents establish their actions first would stabilize the learning process and thus mitigate non-stationarity and 2) enabled by our proposed mechanism called action propagation, where higher-priority agents act first and communicate their actions, providing a more stationary environment for others. Moreover, instead of using a predefined or heuristic priority assignment, XP-MARL learns priority-assignment policies with an auxiliary MARL problem, leading to a joint learning scheme. Experiments in a motion-planning scenario involving Connected and Automated Vehicles (CAVs) demonstrate that XP-MARL improves the safety of a baseline model by 84.4% and outperforms a state-of-the-art approach, which improves the baseline by only 12.8%. Code: github.com/cas-lab-munich/sigmarl
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信