{"title":"Takagi-Sugeno模糊系统的逆q学习最优控制","authors":"Wenting Song;Jun Ning;Shaocheng Tong","doi":"10.1109/TFUZZ.2025.3563361","DOIUrl":null,"url":null,"abstract":"Inverse reinforcement learning optimal control is under the framework of learner–expert, the learner system can learn expert system's trajectory and optimal control policy via a reinforcement learning algorithm and does not need the predefined cost function, so it can solve optimal control problem effectively. This article develops a fuzzy inverse reinforcement learning optimal control scheme with inverse reinforcement learning algorithm for Takagi–Sugeno (T–S) fuzzy systems with disturbances. Since the controlled fuzzy systems (learner systems) desire to learn or imitate expert system's behavior trajectories, a learner–expert structure is established, where the learner only know the expert system's optimal control policy. To reconstruct expert system's cost function, we develop a model-free inverse Q-learning algorithm that consists of two learning stages: an inner Q-learning iteration loop and an outer inverse optimal iteration loop. The inner loop aims to find fuzzy optimal control policy and the worst-case disturbance input via learner system's cost function by employing zero-sum differential game theory. The outer one is to update learner system's state-penalty weight via only observing expert systems' optimal control policy. The model-free algorithm does not require that the controlled system dynamics are known. It is proved that the designed algorithm is convergent and also the developed inverse reinforcement learning optimal control policy can ensure T–S fuzzy learner system to obtain Nash equilibrium solution. Finally, we apply the presented fuzzy inverse Q-learning optimal control method to nonlinear unmanned surface vehicle system and the computer simulation results verified the effectiveness of the developed scheme.","PeriodicalId":13212,"journal":{"name":"IEEE Transactions on Fuzzy Systems","volume":"33 7","pages":"2308-2320"},"PeriodicalIF":10.7000,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Inverse Q-Learning Optimal Control for Takagi–Sugeno Fuzzy Systems\",\"authors\":\"Wenting Song;Jun Ning;Shaocheng Tong\",\"doi\":\"10.1109/TFUZZ.2025.3563361\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Inverse reinforcement learning optimal control is under the framework of learner–expert, the learner system can learn expert system's trajectory and optimal control policy via a reinforcement learning algorithm and does not need the predefined cost function, so it can solve optimal control problem effectively. This article develops a fuzzy inverse reinforcement learning optimal control scheme with inverse reinforcement learning algorithm for Takagi–Sugeno (T–S) fuzzy systems with disturbances. Since the controlled fuzzy systems (learner systems) desire to learn or imitate expert system's behavior trajectories, a learner–expert structure is established, where the learner only know the expert system's optimal control policy. To reconstruct expert system's cost function, we develop a model-free inverse Q-learning algorithm that consists of two learning stages: an inner Q-learning iteration loop and an outer inverse optimal iteration loop. The inner loop aims to find fuzzy optimal control policy and the worst-case disturbance input via learner system's cost function by employing zero-sum differential game theory. The outer one is to update learner system's state-penalty weight via only observing expert systems' optimal control policy. The model-free algorithm does not require that the controlled system dynamics are known. It is proved that the designed algorithm is convergent and also the developed inverse reinforcement learning optimal control policy can ensure T–S fuzzy learner system to obtain Nash equilibrium solution. Finally, we apply the presented fuzzy inverse Q-learning optimal control method to nonlinear unmanned surface vehicle system and the computer simulation results verified the effectiveness of the developed scheme.\",\"PeriodicalId\":13212,\"journal\":{\"name\":\"IEEE Transactions on Fuzzy Systems\",\"volume\":\"33 7\",\"pages\":\"2308-2320\"},\"PeriodicalIF\":10.7000,\"publicationDate\":\"2025-04-22\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Fuzzy Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10972346/\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Fuzzy Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10972346/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Inverse Q-Learning Optimal Control for Takagi–Sugeno Fuzzy Systems
Inverse reinforcement learning optimal control is under the framework of learner–expert, the learner system can learn expert system's trajectory and optimal control policy via a reinforcement learning algorithm and does not need the predefined cost function, so it can solve optimal control problem effectively. This article develops a fuzzy inverse reinforcement learning optimal control scheme with inverse reinforcement learning algorithm for Takagi–Sugeno (T–S) fuzzy systems with disturbances. Since the controlled fuzzy systems (learner systems) desire to learn or imitate expert system's behavior trajectories, a learner–expert structure is established, where the learner only know the expert system's optimal control policy. To reconstruct expert system's cost function, we develop a model-free inverse Q-learning algorithm that consists of two learning stages: an inner Q-learning iteration loop and an outer inverse optimal iteration loop. The inner loop aims to find fuzzy optimal control policy and the worst-case disturbance input via learner system's cost function by employing zero-sum differential game theory. The outer one is to update learner system's state-penalty weight via only observing expert systems' optimal control policy. The model-free algorithm does not require that the controlled system dynamics are known. It is proved that the designed algorithm is convergent and also the developed inverse reinforcement learning optimal control policy can ensure T–S fuzzy learner system to obtain Nash equilibrium solution. Finally, we apply the presented fuzzy inverse Q-learning optimal control method to nonlinear unmanned surface vehicle system and the computer simulation results verified the effectiveness of the developed scheme.
期刊介绍:
The IEEE Transactions on Fuzzy Systems is a scholarly journal that focuses on the theory, design, and application of fuzzy systems. It aims to publish high-quality technical papers that contribute significant technical knowledge and exploratory developments in the field of fuzzy systems. The journal particularly emphasizes engineering systems and scientific applications. In addition to research articles, the Transactions also includes a letters section featuring current information, comments, and rebuttals related to published papers.