Optimal Time-Varying Q-Learning Algorithm for Affine Nonlinear Systems With Coupled Players

IF 8.7 1区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

IEEE Transactions on Systems Man Cybernetics-Systems Pub Date : 2025-07-10 DOI:10.1109/TSMC.2025.3580988

Huaguang Zhang;Shuhang Yu;Jiayue Sun;Mei Li

{"title":"Optimal Time-Varying Q-Learning Algorithm for Affine Nonlinear Systems With Coupled Players","authors":"Huaguang Zhang;Shuhang Yu;Jiayue Sun;Mei Li","doi":"10.1109/TSMC.2025.3580988","DOIUrl":null,"url":null,"abstract":"To address the finite-horizon coupled two-player mixed <inline-formula> <tex-math>$H_{2}/H_{\\infty }$ </tex-math></inline-formula> control challenge within a continuous-time affine nonlinear system, this research introduces a distinctive Q-function and presents an innovative adaptive dynamic programming (ADP) method that operates autonomously of system-specific information. Initially, we formulate the time-varying Hamilton-Jacobi–Isaacs (HJI) equations, which pose a significant challenge for resolution due to their time-dependent and nonlinear nature. Subsequently, a novel offline policy iteration (PI) algorithm is introduced, highlighting its convergence and reinforcing the substantive proof of the existence of Nash equilibrium points. Moreover, a novel action-dependent Q-function is established to facilitate entirely model-free learning, representing the initial foray into the mixed <inline-formula> <tex-math>$H_{2}/H_{\\infty }$ </tex-math></inline-formula> control problem involving coupled players. The Lyapunov direct approach is employed to ensure the stability of the closed-loop uncertain affine nonlinear system under the ADP-based control scheme, guaranteeing uniform ultimate boundedness (UUB). Finally, a numerical simulation is conducted to validate the effectiveness of the aforementioned ADP-based control approach.","PeriodicalId":48915,"journal":{"name":"IEEE Transactions on Systems Man Cybernetics-Systems","volume":"55 10","pages":"7037-7047"},"PeriodicalIF":8.7000,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Systems Man Cybernetics-Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/11074761/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

To address the finite-horizon coupled two-player mixed

$H_{2}/H_{\infty }$

control challenge within a continuous-time affine nonlinear system, this research introduces a distinctive Q-function and presents an innovative adaptive dynamic programming (ADP) method that operates autonomously of system-specific information. Initially, we formulate the time-varying Hamilton-Jacobi–Isaacs (HJI) equations, which pose a significant challenge for resolution due to their time-dependent and nonlinear nature. Subsequently, a novel offline policy iteration (PI) algorithm is introduced, highlighting its convergence and reinforcing the substantive proof of the existence of Nash equilibrium points. Moreover, a novel action-dependent Q-function is established to facilitate entirely model-free learning, representing the initial foray into the mixed

$H_{2}/H_{\infty }$

control problem involving coupled players. The Lyapunov direct approach is employed to ensure the stability of the closed-loop uncertain affine nonlinear system under the ADP-based control scheme, guaranteeing uniform ultimate boundedness (UUB). Finally, a numerical simulation is conducted to validate the effectiveness of the aforementioned ADP-based control approach.

查看原文本刊更多论文

耦合参与人仿射非线性系统的最优时变q学习算法

为了解决连续时间仿射非线性系统中的有限视界耦合双玩家混合$H_{2}/H_{\infty }$控制挑战，本研究引入了独特的q函数，并提出了一种创新的自适应动态规划（ADP）方法，该方法可根据系统特定信息自主运行。首先，我们建立了时变Hamilton-Jacobi-Isaacs （HJI）方程，由于其时变和非线性性质，对分辨率提出了重大挑战。随后，引入了一种新的离线策略迭代算法，突出了其收敛性，并加强了纳什平衡点存在性的实质性证明。此外，建立了一个新的动作依赖q函数，以促进完全无模型学习，代表了涉及耦合参与者的混合$H_{2}/H_{\infty }$控制问题的初步尝试。采用Lyapunov直接法保证了闭环不确定仿射非线性系统在adp控制方案下的稳定性，保证了系统的一致最终有界性。最后，通过数值仿真验证了上述基于adp的控制方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Systems Man Cybernetics-Systems AUTOMATION & CONTROL SYSTEMS-COMPUTER SCIENCE, CYBERNETICS

CiteScore

18.50

自引率

11.50%

发文量

812

审稿时长

6 months

期刊介绍： The IEEE Transactions on Systems, Man, and Cybernetics: Systems encompasses the fields of systems engineering, covering issue formulation, analysis, and modeling throughout the systems engineering lifecycle phases. It addresses decision-making, issue interpretation, systems management, processes, and various methods such as optimization, modeling, and simulation in the development and deployment of large systems.