零和游戏的折扣稳定自适应批判设计与应用验证

IF 6.4 2区计算机科学 Q1 AUTOMATION & CONTROL SYSTEMS

IEEE Transactions on Automation Science and Engineering Pub Date : 2025-02-07 DOI:10.1109/TASE.2025.3539772

Jin Ren;Ding Wang;Menghua Li;Junfei Qiao

{"title":"零和游戏的折扣稳定自适应批判设计与应用验证","authors":"Jin Ren;Ding Wang;Menghua Li;Junfei Qiao","doi":"10.1109/TASE.2025.3539772","DOIUrl":null,"url":null,"abstract":"In this paper, an adaptive critic design with performance guarantee is established based on the discounted value iteration algorithm to settle with the optimal regulation problem for discrete-time zero-sum games. Value iteration is implemented to obtain the approximate optimal solutions to the Hamilton-Jacobi-Isaacs equation for nonlinear systems and the game algebraic Riccati equation for linear systems. Then, we focus on system stability affected by the introduction of the discount factor and the admissibility of the policy pairs in the value iteration process. The appropriate selection range of the discount factor and the criteria for ensuring system stability are established to assist in obtaining the stabilized optimal policy pair, which not only makes the cost function converge to the optimal value, but also guarantees the asymptotic stability of the closed-loop system. Finally, practical examples for the power system and the ball-beam system are conducted to demonstrate the effectiveness of the presented method. Note to Practitioners—Since there exist a multitude of dynamic systems with uncertainty and interference, the zero-sum game problems are ubiquitous, especially when dealing with dynamic systems featuring antagonistic properties. As an important research direction in the field of optimal control, zero-sum games usually involve designing policy pairs that can optimize the system performance in the presence of adversarial disturbances. Due to the excellent adaptability, value iteration in adaptive dynamic programming is employed to deal with this kind of issues. In addition to focusing on the optimality of policies, the system stability during the control process is equally significance, where the stability is the premise of all operations. Therefore, we are dedicated to providing guidance on the optimal regulation of discrete-time zero-sum games with performance guarantee, which contributes to obtain the stable optimal policy pair. Theoretical analysis of the stability is provided and the asymptotic stability of the system is ensured, which improves the performance of the designed controller. Furthermore, simulation experiments for practical applications are conducted, which verify the feasibility and effectiveness of the proposed control design.","PeriodicalId":51060,"journal":{"name":"IEEE Transactions on Automation Science and Engineering","volume":"22 ","pages":"11706-11716"},"PeriodicalIF":6.4000,"publicationDate":"2025-02-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Discounted Stable Adaptive Critic Design for Zero-Sum Games With Application Verifications\",\"authors\":\"Jin Ren;Ding Wang;Menghua Li;Junfei Qiao\",\"doi\":\"10.1109/TASE.2025.3539772\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, an adaptive critic design with performance guarantee is established based on the discounted value iteration algorithm to settle with the optimal regulation problem for discrete-time zero-sum games. Value iteration is implemented to obtain the approximate optimal solutions to the Hamilton-Jacobi-Isaacs equation for nonlinear systems and the game algebraic Riccati equation for linear systems. Then, we focus on system stability affected by the introduction of the discount factor and the admissibility of the policy pairs in the value iteration process. The appropriate selection range of the discount factor and the criteria for ensuring system stability are established to assist in obtaining the stabilized optimal policy pair, which not only makes the cost function converge to the optimal value, but also guarantees the asymptotic stability of the closed-loop system. Finally, practical examples for the power system and the ball-beam system are conducted to demonstrate the effectiveness of the presented method. Note to Practitioners—Since there exist a multitude of dynamic systems with uncertainty and interference, the zero-sum game problems are ubiquitous, especially when dealing with dynamic systems featuring antagonistic properties. As an important research direction in the field of optimal control, zero-sum games usually involve designing policy pairs that can optimize the system performance in the presence of adversarial disturbances. Due to the excellent adaptability, value iteration in adaptive dynamic programming is employed to deal with this kind of issues. In addition to focusing on the optimality of policies, the system stability during the control process is equally significance, where the stability is the premise of all operations. Therefore, we are dedicated to providing guidance on the optimal regulation of discrete-time zero-sum games with performance guarantee, which contributes to obtain the stable optimal policy pair. Theoretical analysis of the stability is provided and the asymptotic stability of the system is ensured, which improves the performance of the designed controller. Furthermore, simulation experiments for practical applications are conducted, which verify the feasibility and effectiveness of the proposed control design.\",\"PeriodicalId\":51060,\"journal\":{\"name\":\"IEEE Transactions on Automation Science and Engineering\",\"volume\":\"22 \",\"pages\":\"11706-11716\"},\"PeriodicalIF\":6.4000,\"publicationDate\":\"2025-02-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Automation Science and Engineering\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10877926/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Automation Science and Engineering","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10877926/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

针对离散时间零和博弈的最优调节问题，基于折现值迭代算法，建立了具有性能保证的自适应评论家设计。采用数值迭代方法求解非线性系统的Hamilton-Jacobi-Isaacs方程和线性系统的博弈代数Riccati方程的近似最优解。然后，重点研究了在数值迭代过程中引入折现因子对系统稳定性的影响以及策略对的可容许性。建立了折现因子的适当选择范围和保证系统稳定性的准则，以帮助获得稳定的最优策略对，使代价函数收敛于最优值，同时保证了闭环系统的渐近稳定。最后，以电力系统和球梁系统为例，验证了该方法的有效性。从业人员注意：由于存在大量具有不确定性和干扰的动态系统，零和博弈问题无处不在，特别是在处理具有对抗性的动态系统时。零和博弈是最优控制领域的一个重要研究方向，通常涉及在存在对抗性干扰的情况下设计能够优化系统性能的策略对。由于自适应动态规划具有良好的自适应性，因此采用自适应动态规划中的值迭代来处理这类问题。除了关注策略的最优性外，控制过程中的系统稳定性也同样重要，稳定性是所有操作的前提。因此，我们致力于为具有性能保证的离散时间零和博弈的最优调控提供指导，有助于获得稳定的最优政策对。对系统的稳定性进行了理论分析，保证了系统的渐近稳定，提高了所设计控制器的性能。最后进行了实际应用的仿真实验，验证了所提控制设计的可行性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Discounted Stable Adaptive Critic Design for Zero-Sum Games With Application Verifications

In this paper, an adaptive critic design with performance guarantee is established based on the discounted value iteration algorithm to settle with the optimal regulation problem for discrete-time zero-sum games. Value iteration is implemented to obtain the approximate optimal solutions to the Hamilton-Jacobi-Isaacs equation for nonlinear systems and the game algebraic Riccati equation for linear systems. Then, we focus on system stability affected by the introduction of the discount factor and the admissibility of the policy pairs in the value iteration process. The appropriate selection range of the discount factor and the criteria for ensuring system stability are established to assist in obtaining the stabilized optimal policy pair, which not only makes the cost function converge to the optimal value, but also guarantees the asymptotic stability of the closed-loop system. Finally, practical examples for the power system and the ball-beam system are conducted to demonstrate the effectiveness of the presented method. Note to Practitioners—Since there exist a multitude of dynamic systems with uncertainty and interference, the zero-sum game problems are ubiquitous, especially when dealing with dynamic systems featuring antagonistic properties. As an important research direction in the field of optimal control, zero-sum games usually involve designing policy pairs that can optimize the system performance in the presence of adversarial disturbances. Due to the excellent adaptability, value iteration in adaptive dynamic programming is employed to deal with this kind of issues. In addition to focusing on the optimality of policies, the system stability during the control process is equally significance, where the stability is the premise of all operations. Therefore, we are dedicated to providing guidance on the optimal regulation of discrete-time zero-sum games with performance guarantee, which contributes to obtain the stable optimal policy pair. Theoretical analysis of the stability is provided and the asymptotic stability of the system is ensured, which improves the performance of the designed controller. Furthermore, simulation experiments for practical applications are conducted, which verify the feasibility and effectiveness of the proposed control design.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Automation Science and Engineering 工程技术-自动化与控制系统

CiteScore

12.50

自引率

14.30%

发文量

404

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Automation Science and Engineering (T-ASE) publishes fundamental papers on Automation, emphasizing scientific results that advance efficiency, quality, productivity, and reliability. T-ASE encourages interdisciplinary approaches from computer science, control systems, electrical engineering, mathematics, mechanical engineering, operations research, and other fields. T-ASE welcomes results relevant to industries such as agriculture, biotechnology, healthcare, home automation, maintenance, manufacturing, pharmaceuticals, retail, security, service, supply chains, and transportation. T-ASE addresses a research community willing to integrate knowledge across disciplines and industries. For this purpose, each paper includes a Note to Practitioners that summarizes how its results can be applied or how they might be extended to apply in practice.