A Q-Learning Algorithm to Solve the Two-Player Zero-Sum Game Problem for Nonlinear Systems

IF 3.9 4区计算机科学 Q2 AUTOMATION & CONTROL SYSTEMS

International Journal of Adaptive Control and Signal Processing Pub Date : 2025-01-06 DOI:10.1002/acs.3958

Afreen Islam, Anthony Siming Chen, Guido Herrmann

{"title":"A Q-Learning Algorithm to Solve the Two-Player Zero-Sum Game Problem for Nonlinear Systems","authors":"Afreen Islam, Anthony Siming Chen, Guido Herrmann","doi":"10.1002/acs.3958","DOIUrl":null,"url":null,"abstract":"<p>This paper deals with the two-player zero-sum game problem, which is a bounded <span></span><math>\n <semantics>\n <mrow>\n <msub>\n <mrow>\n <mi>L</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msub>\n </mrow>\n <annotation>$$ {L}_2 $$</annotation>\n </semantics></math>-gain robust control problem. Finding an analytical solution to the complex Hamilton-Jacobi-Issacs (HJI) equation is a challenging task. Hence, a novel Q-learning algorithm for unknown continuous-time (CT) affine-in-inputs nonlinear systems is proposed for generating an approximate solution to the HJI equation, which is valid in a local domain due to the use of a local approximator, that is, a Neural Network (NN) structure. The approach is model-free and does not require the knowledge of system drift dynamics, and input and disturbance gains. The algorithm learns online from measurements of state variables in real time. To generate the local approximate solution of the HJI equation for the two-player zero-sum game problem for nonlinear systems, the proposed non-iterative algorithm requires only a single critic NN instead of the commonly used triple NN approximator structure. A persistence of excitation condition is required to guarantee Uniformly Ultimately Boundedness (UUB) and convergence to the optimal solution. The effectiveness of the proposed Q-learning approach for the two-player zero-sum game problem is demonstrated via simulations of a linear F-16 aircraft plant and a highly complex nonlinear system. Proof of closed-loop system stability is provided using Lyapunov Analysis, and convergence of the approximate solution to the true saddle-point solution is guaranteed in a UUB-sense.</p>","PeriodicalId":50347,"journal":{"name":"International Journal of Adaptive Control and Signal Processing","volume":"39 3","pages":"566-581"},"PeriodicalIF":3.9000,"publicationDate":"2025-01-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/acs.3958","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Adaptive Control and Signal Processing","FirstCategoryId":"94","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/acs.3958","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

This paper deals with the two-player zero-sum game problem, which is a bounded $L_{2}$ -gain robust control problem. Finding an analytical solution to the complex Hamilton-Jacobi-Issacs (HJI) equation is a challenging task. Hence, a novel Q-learning algorithm for unknown continuous-time (CT) affine-in-inputs nonlinear systems is proposed for generating an approximate solution to the HJI equation, which is valid in a local domain due to the use of a local approximator, that is, a Neural Network (NN) structure. The approach is model-free and does not require the knowledge of system drift dynamics, and input and disturbance gains. The algorithm learns online from measurements of state variables in real time. To generate the local approximate solution of the HJI equation for the two-player zero-sum game problem for nonlinear systems, the proposed non-iterative algorithm requires only a single critic NN instead of the commonly used triple NN approximator structure. A persistence of excitation condition is required to guarantee Uniformly Ultimately Boundedness (UUB) and convergence to the optimal solution. The effectiveness of the proposed Q-learning approach for the two-player zero-sum game problem is demonstrated via simulations of a linear F-16 aircraft plant and a highly complex nonlinear system. Proof of closed-loop system stability is provided using Lyapunov Analysis, and convergence of the approximate solution to the true saddle-point solution is guaranteed in a UUB-sense.

Abstract Image

查看原文本刊更多论文

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of Adaptive Control and Signal Processing 工程技术-工程：电子与电气

CiteScore

5.30

自引率

16.10%

发文量

163

审稿时长

5 months

期刊介绍： The International Journal of Adaptive Control and Signal Processing is concerned with the design, synthesis and application of estimators or controllers where adaptive features are needed to cope with uncertainties.Papers on signal processing should also have some relevance to adaptive systems. The journal focus is on model based control design approaches rather than heuristic or rule based control design methods. All papers will be expected to include significant novel material. Both the theory and application of adaptive systems and system identification are areas of interest. Papers on applications can include problems in the implementation of algorithms for real time signal processing and control. The stability, convergence, robustness and numerical aspects of adaptive algorithms are also suitable topics. The related subjects of controller tuning, filtering, networks and switching theory are also of interest. Principal areas to be addressed include: Auto-Tuning, Self-Tuning and Model Reference Adaptive Controllers Nonlinear, Robust and Intelligent Adaptive Controllers Linear and Nonlinear Multivariable System Identification and Estimation Identification of Linear Parameter Varying, Distributed and Hybrid Systems Multiple Model Adaptive Control Adaptive Signal processing Theory and Algorithms Adaptation in Multi-Agent Systems Condition Monitoring Systems Fault Detection and Isolation Methods Fault Detection and Isolation Methods Fault-Tolerant Control (system supervision and diagnosis) Learning Systems and Adaptive Modelling Real Time Algorithms for Adaptive Signal Processing and Control Adaptive Signal Processing and Control Applications Adaptive Cloud Architectures and Networking Adaptive Mechanisms for Internet of Things Adaptive Sliding Mode Control.