{"title":"Adaptive Dynamic Programming for Solving Non-Zero-Sum Differential Games","authors":"Hongliang Li, Derong Liu, Ding Wang","doi":"10.3182/20130902-3-CN-3020.00124","DOIUrl":null,"url":null,"abstract":"Abstract In this paper, a novel adaptive dynamic programming algorithm based on policy iteration is developed to solve online multi-player non-zero-sum differential game for continuous-time nonlinear systems. This algorithm is mathematically equivalent to the quasi-Newton's iteration in a Banach space. The implementation using neural networks is given, where a critic neural network is used to learn its value function, and an action neural network sharing the same parameters with the corresponding critic neural network is used to learn its optimal control policy for each player. All the critic and action neural networks are updated online in real-time and continuously. A simulation example is presented to demonstrate the effectiveness of the developed scheme.","PeriodicalId":90521,"journal":{"name":"IEEE International Conference on Systems Biology : [proceedings]. IEEE International Conference on Systems Biology","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Conference on Systems Biology : [proceedings]. IEEE International Conference on Systems Biology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3182/20130902-3-CN-3020.00124","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Abstract In this paper, a novel adaptive dynamic programming algorithm based on policy iteration is developed to solve online multi-player non-zero-sum differential game for continuous-time nonlinear systems. This algorithm is mathematically equivalent to the quasi-Newton's iteration in a Banach space. The implementation using neural networks is given, where a critic neural network is used to learn its value function, and an action neural network sharing the same parameters with the corresponding critic neural network is used to learn its optimal control policy for each player. All the critic and action neural networks are updated online in real-time and continuously. A simulation example is presented to demonstrate the effectiveness of the developed scheme.