{"title":"Multi-agent reinforcement learning approach based on reduced value function approximations","authors":"M. Abouheaf, W. Gueaieb","doi":"10.1109/IRIS.2017.8250107","DOIUrl":null,"url":null,"abstract":"This paper introduces novel online adaptive Reinforcement Learning approach based on Policy Iteration for multi-agent systems interacting on graphs. The approach uses reduced value functions to solve the coupled Bellman and Hamilton-Jacobi-Bellman equations for multi-agent systems. This done using only partial knowledge about the agents' dynamics. The convergence of the approach is shown to depend on the properties of the communication graph. The Policy Iteration approach is implemented in real-time using neural networks, where reduced value functions are considered to reduce the computational complexity.","PeriodicalId":213724,"journal":{"name":"2017 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS)","volume":"356 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE International Symposium on Robotics and Intelligent Sensors (IRIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IRIS.2017.8250107","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
This paper introduces novel online adaptive Reinforcement Learning approach based on Policy Iteration for multi-agent systems interacting on graphs. The approach uses reduced value functions to solve the coupled Bellman and Hamilton-Jacobi-Bellman equations for multi-agent systems. This done using only partial knowledge about the agents' dynamics. The convergence of the approach is shown to depend on the properties of the communication graph. The Policy Iteration approach is implemented in real-time using neural networks, where reduced value functions are considered to reduce the computational complexity.