{"title":"基于K-FAC的多智能体强化学习信任域方法","authors":"Jiali Yu, Fengge Wu, Junsuo Zhao","doi":"10.1145/3579654.3579702","DOIUrl":null,"url":null,"abstract":"A challenging problem in multi-agent reinforcement learning (MARL) is to ensure that the policy converges quickly and is effective with limited computing resources. This paper extends the second-order optimization to MARL using Kronecker-factored approximate curvature (K-FAC) to approximate the natural gradient update. And it solves the challenge of training policy networks in MARL which requires a lot of time and computing costs. We propose a Heterogeneous-agent Trust Region algorithm using K-FAC (HAKTR). Further more, we endow HAKTR with monotonic performance improvement based on the multi-agent advantage decomposition theorem. Our algorithm is evaluated on continuous tasks in the MuJoCo environment. The experimental results show that HAKTR can achieve higher rewards with less computing costs compared to the baselines such as HATRPO and HAPPO. Moreover, HAKTR has good scalability regarding the number of agents and can be applied to large-scale networks.","PeriodicalId":146783,"journal":{"name":"Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence","volume":"151 1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-12-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Trust Region Method Using K-FAC in Multi-Agent Reinforcement Learning\",\"authors\":\"Jiali Yu, Fengge Wu, Junsuo Zhao\",\"doi\":\"10.1145/3579654.3579702\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A challenging problem in multi-agent reinforcement learning (MARL) is to ensure that the policy converges quickly and is effective with limited computing resources. This paper extends the second-order optimization to MARL using Kronecker-factored approximate curvature (K-FAC) to approximate the natural gradient update. And it solves the challenge of training policy networks in MARL which requires a lot of time and computing costs. We propose a Heterogeneous-agent Trust Region algorithm using K-FAC (HAKTR). Further more, we endow HAKTR with monotonic performance improvement based on the multi-agent advantage decomposition theorem. Our algorithm is evaluated on continuous tasks in the MuJoCo environment. The experimental results show that HAKTR can achieve higher rewards with less computing costs compared to the baselines such as HATRPO and HAPPO. Moreover, HAKTR has good scalability regarding the number of agents and can be applied to large-scale networks.\",\"PeriodicalId\":146783,\"journal\":{\"name\":\"Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence\",\"volume\":\"151 1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-12-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3579654.3579702\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 5th International Conference on Algorithms, Computing and Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3579654.3579702","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Trust Region Method Using K-FAC in Multi-Agent Reinforcement Learning
A challenging problem in multi-agent reinforcement learning (MARL) is to ensure that the policy converges quickly and is effective with limited computing resources. This paper extends the second-order optimization to MARL using Kronecker-factored approximate curvature (K-FAC) to approximate the natural gradient update. And it solves the challenge of training policy networks in MARL which requires a lot of time and computing costs. We propose a Heterogeneous-agent Trust Region algorithm using K-FAC (HAKTR). Further more, we endow HAKTR with monotonic performance improvement based on the multi-agent advantage decomposition theorem. Our algorithm is evaluated on continuous tasks in the MuJoCo environment. The experimental results show that HAKTR can achieve higher rewards with less computing costs compared to the baselines such as HATRPO and HAPPO. Moreover, HAKTR has good scalability regarding the number of agents and can be applied to large-scale networks.