{"title":"XP-MARL:多代理强化学习中的辅助优先级以解决非稳定性问题","authors":"Jianye Xu, Omar Sobhy, Bassam Alrifaee","doi":"arxiv-2409.11852","DOIUrl":null,"url":null,"abstract":"Non-stationarity poses a fundamental challenge in Multi-Agent Reinforcement\nLearning (MARL), arising from agents simultaneously learning and altering their\npolicies. This creates a non-stationary environment from the perspective of\neach individual agent, often leading to suboptimal or even unconverged learning\noutcomes. We propose an open-source framework named XP-MARL, which augments\nMARL with auxiliary prioritization to address this challenge in cooperative\nsettings. XP-MARL is 1) founded upon our hypothesis that prioritizing agents\nand letting higher-priority agents establish their actions first would\nstabilize the learning process and thus mitigate non-stationarity and 2)\nenabled by our proposed mechanism called action propagation, where\nhigher-priority agents act first and communicate their actions, providing a\nmore stationary environment for others. Moreover, instead of using a predefined\nor heuristic priority assignment, XP-MARL learns priority-assignment policies\nwith an auxiliary MARL problem, leading to a joint learning scheme. Experiments\nin a motion-planning scenario involving Connected and Automated Vehicles (CAVs)\ndemonstrate that XP-MARL improves the safety of a baseline model by 84.4% and\noutperforms a state-of-the-art approach, which improves the baseline by only\n12.8%. Code: github.com/cas-lab-munich/sigmarl","PeriodicalId":501031,"journal":{"name":"arXiv - CS - Robotics","volume":"20 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"XP-MARL: Auxiliary Prioritization in Multi-Agent Reinforcement Learning to Address Non-Stationarity\",\"authors\":\"Jianye Xu, Omar Sobhy, Bassam Alrifaee\",\"doi\":\"arxiv-2409.11852\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Non-stationarity poses a fundamental challenge in Multi-Agent Reinforcement\\nLearning (MARL), arising from agents simultaneously learning and altering their\\npolicies. This creates a non-stationary environment from the perspective of\\neach individual agent, often leading to suboptimal or even unconverged learning\\noutcomes. We propose an open-source framework named XP-MARL, which augments\\nMARL with auxiliary prioritization to address this challenge in cooperative\\nsettings. XP-MARL is 1) founded upon our hypothesis that prioritizing agents\\nand letting higher-priority agents establish their actions first would\\nstabilize the learning process and thus mitigate non-stationarity and 2)\\nenabled by our proposed mechanism called action propagation, where\\nhigher-priority agents act first and communicate their actions, providing a\\nmore stationary environment for others. Moreover, instead of using a predefined\\nor heuristic priority assignment, XP-MARL learns priority-assignment policies\\nwith an auxiliary MARL problem, leading to a joint learning scheme. Experiments\\nin a motion-planning scenario involving Connected and Automated Vehicles (CAVs)\\ndemonstrate that XP-MARL improves the safety of a baseline model by 84.4% and\\noutperforms a state-of-the-art approach, which improves the baseline by only\\n12.8%. Code: github.com/cas-lab-munich/sigmarl\",\"PeriodicalId\":501031,\"journal\":{\"name\":\"arXiv - CS - Robotics\",\"volume\":\"20 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Robotics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.11852\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Robotics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.11852","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
XP-MARL: Auxiliary Prioritization in Multi-Agent Reinforcement Learning to Address Non-Stationarity
Non-stationarity poses a fundamental challenge in Multi-Agent Reinforcement
Learning (MARL), arising from agents simultaneously learning and altering their
policies. This creates a non-stationary environment from the perspective of
each individual agent, often leading to suboptimal or even unconverged learning
outcomes. We propose an open-source framework named XP-MARL, which augments
MARL with auxiliary prioritization to address this challenge in cooperative
settings. XP-MARL is 1) founded upon our hypothesis that prioritizing agents
and letting higher-priority agents establish their actions first would
stabilize the learning process and thus mitigate non-stationarity and 2)
enabled by our proposed mechanism called action propagation, where
higher-priority agents act first and communicate their actions, providing a
more stationary environment for others. Moreover, instead of using a predefined
or heuristic priority assignment, XP-MARL learns priority-assignment policies
with an auxiliary MARL problem, leading to a joint learning scheme. Experiments
in a motion-planning scenario involving Connected and Automated Vehicles (CAVs)
demonstrate that XP-MARL improves the safety of a baseline model by 84.4% and
outperforms a state-of-the-art approach, which improves the baseline by only
12.8%. Code: github.com/cas-lab-munich/sigmarl