{"title":"A general TD-Q learning control approach for discrete-time Markov jump systems.","authors":"Jiwei Wen, Huiwen Xue, Xiaoli Luan, Peng Shi","doi":"10.1016/j.isatra.2025.02.032","DOIUrl":null,"url":null,"abstract":"<p><p>This paper develops a novel temporal difference Q (TD-Q) learning approach, designed to address the robust control challenge in discrete-time Markov jump systems (MJSs) which are characterized by entirely unknown dynamics and transition probabilities (TPs). The model-free TD-Q learning method is uniquely comprehensive, including two special cases: Q learning for MJSs with unknown dynamics, and TD learning for MJSs with undetermined TPs. We propose an innovative ternary policy iteration framework, which iteratively refines the control policies through a dynamic loop of alternating updates. This loop consists of three synergistic processes: firstly, aligning TD value functions with current policies; secondly, enhancing Q-function's matrix kernels (QFMKs) using these TD value functions; and thirdly, generating greedy policies based on the enhanced QFMKs. We demonstrate that, with a sufficient number of episodes, the TD value functions, QFMKs, and control policies converge optimally within this iterative loop. To illustrate efficiency of the developed approach, we introduce a numerical example that highlights its substantial benefits through a thorough comparison with current learning control methods for MJSs. Moreover, a structured population dynamics model for pests is utilized to validate the practical applicability.</p>","PeriodicalId":94059,"journal":{"name":"ISA transactions","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ISA transactions","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.isatra.2025.02.032","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper develops a novel temporal difference Q (TD-Q) learning approach, designed to address the robust control challenge in discrete-time Markov jump systems (MJSs) which are characterized by entirely unknown dynamics and transition probabilities (TPs). The model-free TD-Q learning method is uniquely comprehensive, including two special cases: Q learning for MJSs with unknown dynamics, and TD learning for MJSs with undetermined TPs. We propose an innovative ternary policy iteration framework, which iteratively refines the control policies through a dynamic loop of alternating updates. This loop consists of three synergistic processes: firstly, aligning TD value functions with current policies; secondly, enhancing Q-function's matrix kernels (QFMKs) using these TD value functions; and thirdly, generating greedy policies based on the enhanced QFMKs. We demonstrate that, with a sufficient number of episodes, the TD value functions, QFMKs, and control policies converge optimally within this iterative loop. To illustrate efficiency of the developed approach, we introduce a numerical example that highlights its substantial benefits through a thorough comparison with current learning control methods for MJSs. Moreover, a structured population dynamics model for pests is utilized to validate the practical applicability.