{"title":"Non-stationary value iteration for adaptive average control of piecewise deterministic Markov processes","authors":"O.L.V. Costa , F. Dufour , A. Genadot","doi":"10.1016/j.nahs.2025.101622","DOIUrl":null,"url":null,"abstract":"<div><div>The main goal of this paper is to present a non-stationary value iteration scheme for the adaptive average control of Piecewise Deterministic Markov Processes (PDMPs), introduced by M.H.A. Davis in Davis (1984, 1993) as a family of continuous-time Markov processes punctuated by random jumps and with inter-jump movement driven by a deterministic flow. It is assumed in this paper that there are no boundary jumps. We study the adaptive average optimal control problem of PDMPs, considering that the jump intensity <span><math><mi>λ</mi></math></span>, the post-jump transition kernel <span><math><mi>Q</mi></math></span>, as well as the cost <span><math><mi>C</mi></math></span> depend on an unknown parameter <span><math><msup><mrow><mi>β</mi></mrow><mrow><mo>∗</mo></mrow></msup></math></span>. For a sequence of strongly consistent estimators <span><math><mrow><mo>{</mo><msubsup><mrow><mi>β</mi></mrow><mrow><mi>n</mi></mrow><mrow><mo>∗</mo></mrow></msubsup><mo>}</mo></mrow></math></span> of <span><math><msup><mrow><mi>β</mi></mrow><mrow><mo>∗</mo></mrow></msup></math></span> (that is, <span><math><msubsup><mrow><mi>β</mi></mrow><mrow><mi>n</mi></mrow><mrow><mo>∗</mo></mrow></msubsup></math></span> converge to <span><math><msup><mrow><mi>β</mi></mrow><mrow><mo>∗</mo></mrow></msup></math></span> almost surely) a non-stationary value iteration (depending on the current estimate <span><math><msubsup><mrow><mi>β</mi></mrow><mrow><mi>n</mi></mrow><mrow><mo>∗</mo></mrow></msubsup></math></span>) is shown to be optimal for the long-run average control problem. We assume a total variation norm condition on the parameters <span><math><mi>λ</mi></math></span> and <span><math><mi>Q</mi></math></span> of the process (which generalizes the minorization condition considered in Costa, Dufour and Genadot (2024), resulting in a span-contraction operator. The paper concludes with a numerical example.</div></div>","PeriodicalId":49011,"journal":{"name":"Nonlinear Analysis-Hybrid Systems","volume":"58 ","pages":"Article 101622"},"PeriodicalIF":3.7000,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nonlinear Analysis-Hybrid Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751570X25000482","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
The main goal of this paper is to present a non-stationary value iteration scheme for the adaptive average control of Piecewise Deterministic Markov Processes (PDMPs), introduced by M.H.A. Davis in Davis (1984, 1993) as a family of continuous-time Markov processes punctuated by random jumps and with inter-jump movement driven by a deterministic flow. It is assumed in this paper that there are no boundary jumps. We study the adaptive average optimal control problem of PDMPs, considering that the jump intensity , the post-jump transition kernel , as well as the cost depend on an unknown parameter . For a sequence of strongly consistent estimators of (that is, converge to almost surely) a non-stationary value iteration (depending on the current estimate ) is shown to be optimal for the long-run average control problem. We assume a total variation norm condition on the parameters and of the process (which generalizes the minorization condition considered in Costa, Dufour and Genadot (2024), resulting in a span-contraction operator. The paper concludes with a numerical example.
期刊介绍:
Nonlinear Analysis: Hybrid Systems welcomes all important research and expository papers in any discipline. Papers that are principally concerned with the theory of hybrid systems should contain significant results indicating relevant applications. Papers that emphasize applications should consist of important real world models and illuminating techniques. Papers that interrelate various aspects of hybrid systems will be most welcome.