{"title":"动态未知的多人非零和博弈的数据驱动最优控制","authors":"Liao Zhu, Hongbing Xia, Jiaxu Hou, Ping Guo","doi":"10.1109/DOCS55193.2022.9967753","DOIUrl":null,"url":null,"abstract":"This paper focuses on optimal control problems of discrete-time nonlinear multi-player non-zero-sum games with unknown dynamics. Based on adaptive dynamic programming, a data-driven adaptive critic control method is developed to obtain the optimal strategies. In order to solve multi-player non-zero-sum games, a new globalized dual heuristic dynamic programming design is proposed without a model network. The coupled Hamilton-Jacobi equations are solved by previous and current value functions for the temporal difference errors. Neural networks are used to approximate value functions and optimal strategies, respectively. The weight updating rules for critic networks and action networks are tuned based on the observing data accrued along system trajectories. The stability analysis of all neural network weights is given by the Lyapunov approach. Simulation results are included to verify the performance of the proposed optimal control scheme.","PeriodicalId":348545,"journal":{"name":"2022 4th International Conference on Data-driven Optimization of Complex Systems (DOCS)","volume":"91 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data-Driven Optimal Control for Multi-Player Non-Zero-Sum Games with Unknown Dynamics\",\"authors\":\"Liao Zhu, Hongbing Xia, Jiaxu Hou, Ping Guo\",\"doi\":\"10.1109/DOCS55193.2022.9967753\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"This paper focuses on optimal control problems of discrete-time nonlinear multi-player non-zero-sum games with unknown dynamics. Based on adaptive dynamic programming, a data-driven adaptive critic control method is developed to obtain the optimal strategies. In order to solve multi-player non-zero-sum games, a new globalized dual heuristic dynamic programming design is proposed without a model network. The coupled Hamilton-Jacobi equations are solved by previous and current value functions for the temporal difference errors. Neural networks are used to approximate value functions and optimal strategies, respectively. The weight updating rules for critic networks and action networks are tuned based on the observing data accrued along system trajectories. The stability analysis of all neural network weights is given by the Lyapunov approach. Simulation results are included to verify the performance of the proposed optimal control scheme.\",\"PeriodicalId\":348545,\"journal\":{\"name\":\"2022 4th International Conference on Data-driven Optimization of Complex Systems (DOCS)\",\"volume\":\"91 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 4th International Conference on Data-driven Optimization of Complex Systems (DOCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/DOCS55193.2022.9967753\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 4th International Conference on Data-driven Optimization of Complex Systems (DOCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DOCS55193.2022.9967753","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Data-Driven Optimal Control for Multi-Player Non-Zero-Sum Games with Unknown Dynamics
This paper focuses on optimal control problems of discrete-time nonlinear multi-player non-zero-sum games with unknown dynamics. Based on adaptive dynamic programming, a data-driven adaptive critic control method is developed to obtain the optimal strategies. In order to solve multi-player non-zero-sum games, a new globalized dual heuristic dynamic programming design is proposed without a model network. The coupled Hamilton-Jacobi equations are solved by previous and current value functions for the temporal difference errors. Neural networks are used to approximate value functions and optimal strategies, respectively. The weight updating rules for critic networks and action networks are tuned based on the observing data accrued along system trajectories. The stability analysis of all neural network weights is given by the Lyapunov approach. Simulation results are included to verify the performance of the proposed optimal control scheme.