Yaoyao Zhou , Gang Chen , Changli Pu , Keyu Wu , Zhenghua Chen
{"title":"Distributed policy evaluation over multi-agent network with communication delays","authors":"Yaoyao Zhou , Gang Chen , Changli Pu , Keyu Wu , Zhenghua Chen","doi":"10.1016/j.neucom.2025.130562","DOIUrl":null,"url":null,"abstract":"<div><div>This paper investigates the multi-agent policy evaluation problem for distributed reinforcement learning on time-varying directed communication structure with communication delays. In a completely distributed setting, agents jointly learn the value of a given policy through private local evaluation and neighbors’ evaluation. First, we propose the Push-Sum Dual Averaging Algorithm (PS-DAA) to deal with the distributed policy evaluation problem with communication delays. By considering the inevitable communication delays, a more general time-varying directed communication structure, and more realistic state constraints, PS-DAA still achieves sublinear convergence. Further, considering the case where the full update information is unavailable, we extend PS-DAA to the bandit feedback setting, i.e., the values of the sampling points are used instead of the full gradient information. We prove that compared to the full information scheme, the bandit-feedback PS-DAA does not lead to performance degradation. Finally, we verify the effectiveness of the proposed algorithm through two simulation cases.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130562"},"PeriodicalIF":5.5000,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225012342","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
This paper investigates the multi-agent policy evaluation problem for distributed reinforcement learning on time-varying directed communication structure with communication delays. In a completely distributed setting, agents jointly learn the value of a given policy through private local evaluation and neighbors’ evaluation. First, we propose the Push-Sum Dual Averaging Algorithm (PS-DAA) to deal with the distributed policy evaluation problem with communication delays. By considering the inevitable communication delays, a more general time-varying directed communication structure, and more realistic state constraints, PS-DAA still achieves sublinear convergence. Further, considering the case where the full update information is unavailable, we extend PS-DAA to the bandit feedback setting, i.e., the values of the sampling points are used instead of the full gradient information. We prove that compared to the full information scheme, the bandit-feedback PS-DAA does not lead to performance degradation. Finally, we verify the effectiveness of the proposed algorithm through two simulation cases.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.