{"title":"使用强调权重的分布式演员-评论家学习","authors":"M. Stanković, M. Beko, S. Stankovic","doi":"10.1109/CoDIT55151.2022.9804022","DOIUrl":null,"url":null,"abstract":"In this paper a new Actor-Critic algorithm is proposed for distributed off-policy multi-agent reinforcement learning. It is composed of the Emphatic Temporal Difference ETD${\\left(\\lambda \\right)}$ algorithm (at the Critic stage) and a complementary distributed consensus-based algorithm using the exact gradients of a given criterion function (at the Actor stage). It is demonstrated that the algorithm converges weakly to the invariant set of an ordinary differential equation (ODE) characterizing the whole algorithm. Simulation results are presented as an illustration of high efficiency of the proposed algorithm.","PeriodicalId":185510,"journal":{"name":"2022 8th International Conference on Control, Decision and Information Technologies (CoDIT)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Distributed Actor-Critic Learning Using Emphatic Weightings\",\"authors\":\"M. Stanković, M. Beko, S. Stankovic\",\"doi\":\"10.1109/CoDIT55151.2022.9804022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper a new Actor-Critic algorithm is proposed for distributed off-policy multi-agent reinforcement learning. It is composed of the Emphatic Temporal Difference ETD${\\\\left(\\\\lambda \\\\right)}$ algorithm (at the Critic stage) and a complementary distributed consensus-based algorithm using the exact gradients of a given criterion function (at the Actor stage). It is demonstrated that the algorithm converges weakly to the invariant set of an ordinary differential equation (ODE) characterizing the whole algorithm. Simulation results are presented as an illustration of high efficiency of the proposed algorithm.\",\"PeriodicalId\":185510,\"journal\":{\"name\":\"2022 8th International Conference on Control, Decision and Information Technologies (CoDIT)\",\"volume\":\"6 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 8th International Conference on Control, Decision and Information Technologies (CoDIT)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CoDIT55151.2022.9804022\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 8th International Conference on Control, Decision and Information Technologies (CoDIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CoDIT55151.2022.9804022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Distributed Actor-Critic Learning Using Emphatic Weightings
In this paper a new Actor-Critic algorithm is proposed for distributed off-policy multi-agent reinforcement learning. It is composed of the Emphatic Temporal Difference ETD${\left(\lambda \right)}$ algorithm (at the Critic stage) and a complementary distributed consensus-based algorithm using the exact gradients of a given criterion function (at the Actor stage). It is demonstrated that the algorithm converges weakly to the invariant set of an ordinary differential equation (ODE) characterizing the whole algorithm. Simulation results are presented as an illustration of high efficiency of the proposed algorithm.