{"title":"Distributed Actor-Critic Learning Using Emphatic Weightings","authors":"M. Stanković, M. Beko, S. Stankovic","doi":"10.1109/CoDIT55151.2022.9804022","DOIUrl":null,"url":null,"abstract":"In this paper a new Actor-Critic algorithm is proposed for distributed off-policy multi-agent reinforcement learning. It is composed of the Emphatic Temporal Difference ETD${\\left(\\lambda \\right)}$ algorithm (at the Critic stage) and a complementary distributed consensus-based algorithm using the exact gradients of a given criterion function (at the Actor stage). It is demonstrated that the algorithm converges weakly to the invariant set of an ordinary differential equation (ODE) characterizing the whole algorithm. Simulation results are presented as an illustration of high efficiency of the proposed algorithm.","PeriodicalId":185510,"journal":{"name":"2022 8th International Conference on Control, Decision and Information Technologies (CoDIT)","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 8th International Conference on Control, Decision and Information Technologies (CoDIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CoDIT55151.2022.9804022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
In this paper a new Actor-Critic algorithm is proposed for distributed off-policy multi-agent reinforcement learning. It is composed of the Emphatic Temporal Difference ETD${\left(\lambda \right)}$ algorithm (at the Critic stage) and a complementary distributed consensus-based algorithm using the exact gradients of a given criterion function (at the Actor stage). It is demonstrated that the algorithm converges weakly to the invariant set of an ordinary differential equation (ODE) characterizing the whole algorithm. Simulation results are presented as an illustration of high efficiency of the proposed algorithm.