Non-stationary value iteration for adaptive average control of piecewise deterministic Markov processes

IF 3.7 2区计算机科学 Q2 AUTOMATION & CONTROL SYSTEMS

Nonlinear Analysis-Hybrid Systems Pub Date : 2025-07-29 DOI:10.1016/j.nahs.2025.101622

O.L.V. Costa , F. Dufour , A. Genadot

{"title":"Non-stationary value iteration for adaptive average control of piecewise deterministic Markov processes","authors":"O.L.V. Costa , F. Dufour , A. Genadot","doi":"10.1016/j.nahs.2025.101622","DOIUrl":null,"url":null,"abstract":"<div><div>The main goal of this paper is to present a non-stationary value iteration scheme for the adaptive average control of Piecewise Deterministic Markov Processes (PDMPs), introduced by M.H.A. Davis in Davis (1984, 1993) as a family of continuous-time Markov processes punctuated by random jumps and with inter-jump movement driven by a deterministic flow. It is assumed in this paper that there are no boundary jumps. We study the adaptive average optimal control problem of PDMPs, considering that the jump intensity <math><mi>λ</mi></math>, the post-jump transition kernel <math><mi>Q</mi></math>, as well as the cost <math><mi>C</mi></math> depend on an unknown parameter <math><msup><mrow><mi>β</mi></mrow><mrow><mo>∗</mo></mrow></msup></math>. For a sequence of strongly consistent estimators <math><mrow><mo>{</mo><msubsup><mrow><mi>β</mi></mrow><mrow><mi>n</mi></mrow><mrow><mo>∗</mo></mrow></msubsup><mo>}</mo></mrow></math> of <math><msup><mrow><mi>β</mi></mrow><mrow><mo>∗</mo></mrow></msup></math> (that is, <math><msubsup><mrow><mi>β</mi></mrow><mrow><mi>n</mi></mrow><mrow><mo>∗</mo></mrow></msubsup></math> converge to <math><msup><mrow><mi>β</mi></mrow><mrow><mo>∗</mo></mrow></msup></math> almost surely) a non-stationary value iteration (depending on the current estimate <math><msubsup><mrow><mi>β</mi></mrow><mrow><mi>n</mi></mrow><mrow><mo>∗</mo></mrow></msubsup></math>) is shown to be optimal for the long-run average control problem. We assume a total variation norm condition on the parameters <math><mi>λ</mi></math> and <math><mi>Q</mi></math> of the process (which generalizes the minorization condition considered in Costa, Dufour and Genadot (2024), resulting in a span-contraction operator. The paper concludes with a numerical example.</div></div>","PeriodicalId":49011,"journal":{"name":"Nonlinear Analysis-Hybrid Systems","volume":"58 ","pages":"Article 101622"},"PeriodicalIF":3.7000,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nonlinear Analysis-Hybrid Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751570X25000482","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The main goal of this paper is to present a non-stationary value iteration scheme for the adaptive average control of Piecewise Deterministic Markov Processes (PDMPs), introduced by M.H.A. Davis in Davis (1984, 1993) as a family of continuous-time Markov processes punctuated by random jumps and with inter-jump movement driven by a deterministic flow. It is assumed in this paper that there are no boundary jumps. We study the adaptive average optimal control problem of PDMPs, considering that the jump intensity

λ

, the post-jump transition kernel

Q

, as well as the cost

C

depend on an unknown parameter

β^{*}

. For a sequence of strongly consistent estimators

{β_{n}^{*}}

β^{*}

(that is,

β_{n}^{*}

converge to

β^{*}

almost surely) a non-stationary value iteration (depending on the current estimate

β_{n}^{*}

) is shown to be optimal for the long-run average control problem. We assume a total variation norm condition on the parameters

λ

and

Q

of the process (which generalizes the minorization condition considered in Costa, Dufour and Genadot (2024), resulting in a span-contraction operator. The paper concludes with a numerical example.

查看原文本刊更多论文

分段确定性马尔可夫过程的非平稳迭代自适应平均控制

本文的主要目标是提出一种非平稳值迭代方案，用于分段确定性马尔可夫过程（PDMPs）的自适应平均控制，PDMPs是由M.H.A. Davis在Davis（1984,1993）中引入的，它是由随机跳跃和由确定性流驱动的跨跳跃运动打断的连续时间马尔可夫过程族。本文假设不存在边界跳变。考虑跳跃强度λ、跳跃后过渡核Q和代价C依赖于一个未知参数β *，研究了PDMPs的自适应平均最优控制问题。对于β∗的强一致估计量{βn∗}序列（即βn∗几乎肯定地收敛于β∗），非平稳值迭代（取决于当前估计βn∗）被证明是长期平均控制问题的最佳选择。我们假设过程的参数λ和Q的总变异范数条件（它推广了Costa， Dufour和Genadot（2024）中考虑的最小化条件），从而得到一个跨度收缩算子。最后给出了一个数值算例。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Nonlinear Analysis-Hybrid Systems AUTOMATION & CONTROL SYSTEMS-MATHEMATICS, APPLIED

CiteScore

8.30

自引率

9.50%

发文量

审稿时长

>12 weeks

期刊介绍： Nonlinear Analysis: Hybrid Systems welcomes all important research and expository papers in any discipline. Papers that are principally concerned with the theory of hybrid systems should contain significant results indicating relevant applications. Papers that emphasize applications should consist of important real world models and illuminating techniques. Papers that interrelate various aspects of hybrid systems will be most welcome.