Inverse Q-Learning Optimal Control for Takagi–Sugeno Fuzzy Systems

IF 10.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Fuzzy Systems Pub Date : 2025-04-22 DOI:10.1109/TFUZZ.2025.3563361

Wenting Song;Jun Ning;Shaocheng Tong

{"title":"Inverse Q-Learning Optimal Control for Takagi–Sugeno Fuzzy Systems","authors":"Wenting Song;Jun Ning;Shaocheng Tong","doi":"10.1109/TFUZZ.2025.3563361","DOIUrl":null,"url":null,"abstract":"Inverse reinforcement learning optimal control is under the framework of learner–expert, the learner system can learn expert system's trajectory and optimal control policy via a reinforcement learning algorithm and does not need the predefined cost function, so it can solve optimal control problem effectively. This article develops a fuzzy inverse reinforcement learning optimal control scheme with inverse reinforcement learning algorithm for Takagi–Sugeno (T–S) fuzzy systems with disturbances. Since the controlled fuzzy systems (learner systems) desire to learn or imitate expert system's behavior trajectories, a learner–expert structure is established, where the learner only know the expert system's optimal control policy. To reconstruct expert system's cost function, we develop a model-free inverse Q-learning algorithm that consists of two learning stages: an inner Q-learning iteration loop and an outer inverse optimal iteration loop. The inner loop aims to find fuzzy optimal control policy and the worst-case disturbance input via learner system's cost function by employing zero-sum differential game theory. The outer one is to update learner system's state-penalty weight via only observing expert systems' optimal control policy. The model-free algorithm does not require that the controlled system dynamics are known. It is proved that the designed algorithm is convergent and also the developed inverse reinforcement learning optimal control policy can ensure T–S fuzzy learner system to obtain Nash equilibrium solution. Finally, we apply the presented fuzzy inverse Q-learning optimal control method to nonlinear unmanned surface vehicle system and the computer simulation results verified the effectiveness of the developed scheme.","PeriodicalId":13212,"journal":{"name":"IEEE Transactions on Fuzzy Systems","volume":"33 7","pages":"2308-2320"},"PeriodicalIF":10.7000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Fuzzy Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10972346/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Inverse reinforcement learning optimal control is under the framework of learner–expert, the learner system can learn expert system's trajectory and optimal control policy via a reinforcement learning algorithm and does not need the predefined cost function, so it can solve optimal control problem effectively. This article develops a fuzzy inverse reinforcement learning optimal control scheme with inverse reinforcement learning algorithm for Takagi–Sugeno (T–S) fuzzy systems with disturbances. Since the controlled fuzzy systems (learner systems) desire to learn or imitate expert system's behavior trajectories, a learner–expert structure is established, where the learner only know the expert system's optimal control policy. To reconstruct expert system's cost function, we develop a model-free inverse Q-learning algorithm that consists of two learning stages: an inner Q-learning iteration loop and an outer inverse optimal iteration loop. The inner loop aims to find fuzzy optimal control policy and the worst-case disturbance input via learner system's cost function by employing zero-sum differential game theory. The outer one is to update learner system's state-penalty weight via only observing expert systems' optimal control policy. The model-free algorithm does not require that the controlled system dynamics are known. It is proved that the designed algorithm is convergent and also the developed inverse reinforcement learning optimal control policy can ensure T–S fuzzy learner system to obtain Nash equilibrium solution. Finally, we apply the presented fuzzy inverse Q-learning optimal control method to nonlinear unmanned surface vehicle system and the computer simulation results verified the effectiveness of the developed scheme.

查看原文本刊更多论文

Takagi-Sugeno模糊系统的逆q学习最优控制

逆强化学习最优控制是在学习者-专家的框架下，学习者系统可以通过强化学习算法学习专家系统的轨迹和最优控制策略，不需要预定义的代价函数，因此可以有效地解决最优控制问题。针对具有扰动的Takagi-Sugeno （T-S）模糊系统，提出了一种基于逆强化学习算法的模糊逆强化学习最优控制方案。由于被控模糊系统（学习者系统）渴望学习或模仿专家系统的行为轨迹，因此建立了学习者-专家结构，其中学习者只知道专家系统的最优控制策略。为了重构专家系统的代价函数，我们开发了一种无模型的逆q学习算法，该算法由两个学习阶段组成：内部q学习迭代循环和外部逆最优迭代循环。内环的目标是利用零和微分博弈论，通过学习系统的代价函数找到模糊最优控制策略和最坏干扰输入。一种是仅通过观察专家系统的最优控制策略来更新学习器系统的状态惩罚权。无模型算法不要求被控系统动力学是已知的。证明了所设计的算法具有收敛性，并且所开发的逆强化学习最优控制策略能够保证T-S模糊学习系统获得纳什均衡解。最后，将所提出的模糊逆q学习最优控制方法应用于非线性无人水面车辆系统，计算机仿真结果验证了所提方案的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Fuzzy Systems 工程技术-工程：电子与电气

CiteScore

20.50

自引率

13.40%

发文量

517

审稿时长

3.0 months

期刊介绍： The IEEE Transactions on Fuzzy Systems is a scholarly journal that focuses on the theory, design, and application of fuzzy systems. It aims to publish high-quality technical papers that contribute significant technical knowledge and exploratory developments in the field of fuzzy systems. The journal particularly emphasizes engineering systems and scientific applications. In addition to research articles, the Transactions also includes a letters section featuring current information, comments, and rebuttals related to published papers.