Mirror descent for stochastic control problems with measure-valued controls

IF 1.2 2区 数学 Q3 STATISTICS & PROBABILITY
Bekzhan Kerimkulov , David Šiška , Łukasz Szpruch , Yufei Zhang
{"title":"Mirror descent for stochastic control problems with measure-valued controls","authors":"Bekzhan Kerimkulov ,&nbsp;David Šiška ,&nbsp;Łukasz Szpruch ,&nbsp;Yufei Zhang","doi":"10.1016/j.spa.2025.104765","DOIUrl":null,"url":null,"abstract":"<div><div>This paper studies the convergence of the mirror descent algorithm for finite horizon stochastic control problems with measure-valued control processes. The control objective involves a convex regularisation function, denoted as <span><math><mi>h</mi></math></span>, with regularisation strength determined by the weight <span><math><mrow><mi>τ</mi><mo>≥</mo><mn>0</mn></mrow></math></span>. The setting covers regularised relaxed control problems. Under suitable conditions, we establish the relative smoothness and convexity of the control objective with respect to the Bregman divergence of <span><math><mi>h</mi></math></span>, and prove linear convergence of the algorithm for <span><math><mrow><mi>τ</mi><mo>=</mo><mn>0</mn></mrow></math></span> and exponential convergence for <span><math><mrow><mi>τ</mi><mo>&gt;</mo><mn>0</mn></mrow></math></span>. The results apply to common regularisers including relative entropy, <span><math><msup><mrow><mi>χ</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>-divergence, and entropic Wasserstein costs. This validates recent reinforcement learning heuristics that adding regularisation accelerates the convergence of gradient methods. The proof exploits careful regularity estimates of backward stochastic differential equations in the bounded mean oscillation norm.</div></div>","PeriodicalId":51160,"journal":{"name":"Stochastic Processes and their Applications","volume":"190 ","pages":"Article 104765"},"PeriodicalIF":1.2000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Stochastic Processes and their Applications","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0304414925002091","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}
引用次数: 0

Abstract

This paper studies the convergence of the mirror descent algorithm for finite horizon stochastic control problems with measure-valued control processes. The control objective involves a convex regularisation function, denoted as h, with regularisation strength determined by the weight τ0. The setting covers regularised relaxed control problems. Under suitable conditions, we establish the relative smoothness and convexity of the control objective with respect to the Bregman divergence of h, and prove linear convergence of the algorithm for τ=0 and exponential convergence for τ>0. The results apply to common regularisers including relative entropy, χ2-divergence, and entropic Wasserstein costs. This validates recent reinforcement learning heuristics that adding regularisation accelerates the convergence of gradient methods. The proof exploits careful regularity estimates of backward stochastic differential equations in the bounded mean oscillation norm.
具有测量值控制的随机控制问题的镜像下降
研究了具有测量值控制过程的有限水平随机控制问题的镜像下降算法的收敛性。控制目标涉及一个凸正则化函数,记为h,正则化强度由权值τ≥0决定。该设置涵盖了规则化的放松控制问题。在适当的条件下,我们建立了控制目标相对于h的Bregman散度的相对光滑性和凸性,并证明了算法在τ=0时是线性收敛的,在τ>;0时是指数收敛的。结果适用于常见的正则变量,包括相对熵、χ2-散度和熵Wasserstein成本。这验证了最近的强化学习启发式方法,即添加正则化加速了梯度方法的收敛。该证明利用了有界平均振荡范数中倒向随机微分方程的仔细正则性估计。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Stochastic Processes and their Applications
Stochastic Processes and their Applications 数学-统计学与概率论
CiteScore
2.90
自引率
7.10%
发文量
180
审稿时长
23.6 weeks
期刊介绍: Stochastic Processes and their Applications publishes papers on the theory and applications of stochastic processes. It is concerned with concepts and techniques, and is oriented towards a broad spectrum of mathematical, scientific and engineering interests. Characterization, structural properties, inference and control of stochastic processes are covered. The journal is exacting and scholarly in its standards. Every effort is made to promote innovation, vitality, and communication between disciplines. All papers are refereed.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信