{"title":"XP-MARL: Auxiliary Prioritization in Multi-Agent Reinforcement Learning to Address Non-Stationarity","authors":"Jianye Xu, Omar Sobhy, Bassam Alrifaee","doi":"arxiv-2409.11852","DOIUrl":null,"url":null,"abstract":"Non-stationarity poses a fundamental challenge in Multi-Agent Reinforcement\nLearning (MARL), arising from agents simultaneously learning and altering their\npolicies. This creates a non-stationary environment from the perspective of\neach individual agent, often leading to suboptimal or even unconverged learning\noutcomes. We propose an open-source framework named XP-MARL, which augments\nMARL with auxiliary prioritization to address this challenge in cooperative\nsettings. XP-MARL is 1) founded upon our hypothesis that prioritizing agents\nand letting higher-priority agents establish their actions first would\nstabilize the learning process and thus mitigate non-stationarity and 2)\nenabled by our proposed mechanism called action propagation, where\nhigher-priority agents act first and communicate their actions, providing a\nmore stationary environment for others. Moreover, instead of using a predefined\nor heuristic priority assignment, XP-MARL learns priority-assignment policies\nwith an auxiliary MARL problem, leading to a joint learning scheme. Experiments\nin a motion-planning scenario involving Connected and Automated Vehicles (CAVs)\ndemonstrate that XP-MARL improves the safety of a baseline model by 84.4% and\noutperforms a state-of-the-art approach, which improves the baseline by only\n12.8%. Code: github.com/cas-lab-munich/sigmarl","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Non-stationarity poses a fundamental challenge in Multi-Agent Reinforcement
Learning (MARL), arising from agents simultaneously learning and altering their
policies. This creates a non-stationary environment from the perspective of
each individual agent, often leading to suboptimal or even unconverged learning
outcomes. We propose an open-source framework named XP-MARL, which augments
MARL with auxiliary prioritization to address this challenge in cooperative
settings. XP-MARL is 1) founded upon our hypothesis that prioritizing agents
and letting higher-priority agents establish their actions first would
stabilize the learning process and thus mitigate non-stationarity and 2)
enabled by our proposed mechanism called action propagation, where
higher-priority agents act first and communicate their actions, providing a
more stationary environment for others. Moreover, instead of using a predefined
or heuristic priority assignment, XP-MARL learns priority-assignment policies
with an auxiliary MARL problem, leading to a joint learning scheme. Experiments
in a motion-planning scenario involving Connected and Automated Vehicles (CAVs)
demonstrate that XP-MARL improves the safety of a baseline model by 84.4% and
outperforms a state-of-the-art approach, which improves the baseline by only
12.8%. Code: github.com/cas-lab-munich/sigmarl