Non-stationary value iteration for adaptive average control of piecewise deterministic Markov processes

IF 3.7 2区 计算机科学 Q2 AUTOMATION & CONTROL SYSTEMS
O.L.V. Costa , F. Dufour , A. Genadot
{"title":"Non-stationary value iteration for adaptive average control of piecewise deterministic Markov processes","authors":"O.L.V. Costa ,&nbsp;F. Dufour ,&nbsp;A. Genadot","doi":"10.1016/j.nahs.2025.101622","DOIUrl":null,"url":null,"abstract":"<div><div>The main goal of this paper is to present a non-stationary value iteration scheme for the adaptive average control of Piecewise Deterministic Markov Processes (PDMPs), introduced by M.H.A. Davis in Davis (1984, 1993) as a family of continuous-time Markov processes punctuated by random jumps and with inter-jump movement driven by a deterministic flow. It is assumed in this paper that there are no boundary jumps. We study the adaptive average optimal control problem of PDMPs, considering that the jump intensity <span><math><mi>λ</mi></math></span>, the post-jump transition kernel <span><math><mi>Q</mi></math></span>, as well as the cost <span><math><mi>C</mi></math></span> depend on an unknown parameter <span><math><msup><mrow><mi>β</mi></mrow><mrow><mo>∗</mo></mrow></msup></math></span>. For a sequence of strongly consistent estimators <span><math><mrow><mo>{</mo><msubsup><mrow><mi>β</mi></mrow><mrow><mi>n</mi></mrow><mrow><mo>∗</mo></mrow></msubsup><mo>}</mo></mrow></math></span> of <span><math><msup><mrow><mi>β</mi></mrow><mrow><mo>∗</mo></mrow></msup></math></span> (that is, <span><math><msubsup><mrow><mi>β</mi></mrow><mrow><mi>n</mi></mrow><mrow><mo>∗</mo></mrow></msubsup></math></span> converge to <span><math><msup><mrow><mi>β</mi></mrow><mrow><mo>∗</mo></mrow></msup></math></span> almost surely) a non-stationary value iteration (depending on the current estimate <span><math><msubsup><mrow><mi>β</mi></mrow><mrow><mi>n</mi></mrow><mrow><mo>∗</mo></mrow></msubsup></math></span>) is shown to be optimal for the long-run average control problem. We assume a total variation norm condition on the parameters <span><math><mi>λ</mi></math></span> and <span><math><mi>Q</mi></math></span> of the process (which generalizes the minorization condition considered in Costa, Dufour and Genadot (2024), resulting in a span-contraction operator. The paper concludes with a numerical example.</div></div>","PeriodicalId":49011,"journal":{"name":"Nonlinear Analysis-Hybrid Systems","volume":"58 ","pages":"Article 101622"},"PeriodicalIF":3.7000,"publicationDate":"2025-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Nonlinear Analysis-Hybrid Systems","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1751570X25000482","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

The main goal of this paper is to present a non-stationary value iteration scheme for the adaptive average control of Piecewise Deterministic Markov Processes (PDMPs), introduced by M.H.A. Davis in Davis (1984, 1993) as a family of continuous-time Markov processes punctuated by random jumps and with inter-jump movement driven by a deterministic flow. It is assumed in this paper that there are no boundary jumps. We study the adaptive average optimal control problem of PDMPs, considering that the jump intensity λ, the post-jump transition kernel Q, as well as the cost C depend on an unknown parameter β. For a sequence of strongly consistent estimators {βn} of β (that is, βn converge to β almost surely) a non-stationary value iteration (depending on the current estimate βn) is shown to be optimal for the long-run average control problem. We assume a total variation norm condition on the parameters λ and Q of the process (which generalizes the minorization condition considered in Costa, Dufour and Genadot (2024), resulting in a span-contraction operator. The paper concludes with a numerical example.
分段确定性马尔可夫过程的非平稳迭代自适应平均控制
本文的主要目标是提出一种非平稳值迭代方案,用于分段确定性马尔可夫过程(PDMPs)的自适应平均控制,PDMPs是由M.H.A. Davis在Davis(1984,1993)中引入的,它是由随机跳跃和由确定性流驱动的跨跳跃运动打断的连续时间马尔可夫过程族。本文假设不存在边界跳变。考虑跳跃强度λ、跳跃后过渡核Q和代价C依赖于一个未知参数β *,研究了PDMPs的自适应平均最优控制问题。对于β∗的强一致估计量{βn∗}序列(即βn∗几乎肯定地收敛于β∗),非平稳值迭代(取决于当前估计βn∗)被证明是长期平均控制问题的最佳选择。我们假设过程的参数λ和Q的总变异范数条件(它推广了Costa, Dufour和Genadot(2024)中考虑的最小化条件),从而得到一个跨度收缩算子。最后给出了一个数值算例。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Nonlinear Analysis-Hybrid Systems
Nonlinear Analysis-Hybrid Systems AUTOMATION & CONTROL SYSTEMS-MATHEMATICS, APPLIED
CiteScore
8.30
自引率
9.50%
发文量
65
审稿时长
>12 weeks
期刊介绍: Nonlinear Analysis: Hybrid Systems welcomes all important research and expository papers in any discipline. Papers that are principally concerned with the theory of hybrid systems should contain significant results indicating relevant applications. Papers that emphasize applications should consist of important real world models and illuminating techniques. Papers that interrelate various aspects of hybrid systems will be most welcome.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信