{"title":"A general TD-Q learning control approach for discrete-time Markov jump systems","authors":"Jiwei Wen , Huiwen Xue , Xiaoli Luan , Peng Shi","doi":"10.1016/j.isatra.2025.02.032","DOIUrl":null,"url":null,"abstract":"<div><div>This paper develops a novel temporal difference Q (TD-Q) learning approach, designed to address the robust control challenge in discrete-time Markov jump systems (MJSs) which are characterized by entirely unknown dynamics and transition probabilities (TPs). The model-free TD-Q learning method is uniquely comprehensive, including two special cases: Q learning for MJSs with unknown dynamics, and TD learning for MJSs with undetermined TPs. We propose an innovative ternary policy iteration framework, which iteratively refines the control policies through a dynamic loop of alternating updates. This loop consists of three synergistic processes: firstly, aligning TD value functions with current policies; secondly, enhancing Q-function’s matrix kernels (QFMKs) using these TD value functions; and thirdly, generating greedy policies based on the enhanced QFMKs. We demonstrate that, with a sufficient number of episodes, the TD value functions, QFMKs, and control policies converge optimally within this iterative loop. To illustrate efficiency of the developed approach, we introduce a numerical example that highlights its substantial benefits through a thorough comparison with current learning control methods for MJSs. Moreover, a structured population dynamics model for pests is utilized to validate the practical applicability.</div></div>","PeriodicalId":14660,"journal":{"name":"ISA transactions","volume":"160 ","pages":"Pages 111-121"},"PeriodicalIF":6.3000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISA transactions","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0019057825001223","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
This paper develops a novel temporal difference Q (TD-Q) learning approach, designed to address the robust control challenge in discrete-time Markov jump systems (MJSs) which are characterized by entirely unknown dynamics and transition probabilities (TPs). The model-free TD-Q learning method is uniquely comprehensive, including two special cases: Q learning for MJSs with unknown dynamics, and TD learning for MJSs with undetermined TPs. We propose an innovative ternary policy iteration framework, which iteratively refines the control policies through a dynamic loop of alternating updates. This loop consists of three synergistic processes: firstly, aligning TD value functions with current policies; secondly, enhancing Q-function’s matrix kernels (QFMKs) using these TD value functions; and thirdly, generating greedy policies based on the enhanced QFMKs. We demonstrate that, with a sufficient number of episodes, the TD value functions, QFMKs, and control policies converge optimally within this iterative loop. To illustrate efficiency of the developed approach, we introduce a numerical example that highlights its substantial benefits through a thorough comparison with current learning control methods for MJSs. Moreover, a structured population dynamics model for pests is utilized to validate the practical applicability.
期刊介绍:
ISA Transactions serves as a platform for showcasing advancements in measurement and automation, catering to both industrial practitioners and applied researchers. It covers a wide array of topics within measurement, including sensors, signal processing, data analysis, and fault detection, supported by techniques such as artificial intelligence and communication systems. Automation topics encompass control strategies, modelling, system reliability, and maintenance, alongside optimization and human-machine interaction. The journal targets research and development professionals in control systems, process instrumentation, and automation from academia and industry.