Mirror descent for stochastic control problems with measure-valued controls

IF 1.2 2区数学 Q3 STATISTICS & PROBABILITY

Stochastic Processes and their Applications Pub Date : 2025-08-28 DOI:10.1016/j.spa.2025.104765

Bekzhan Kerimkulov , David Šiška , Łukasz Szpruch , Yufei Zhang

{"title":"Mirror descent for stochastic control problems with measure-valued controls","authors":"Bekzhan Kerimkulov , David Šiška , Łukasz Szpruch , Yufei Zhang","doi":"10.1016/j.spa.2025.104765","DOIUrl":null,"url":null,"abstract":"<div><div>This paper studies the convergence of the mirror descent algorithm for finite horizon stochastic control problems with measure-valued control processes. The control objective involves a convex regularisation function, denoted as <span><math><mi>h</mi></math></span>, with regularisation strength determined by the weight <span><math><mrow><mi>τ</mi><mo>≥</mo><mn>0</mn></mrow></math></span>. The setting covers regularised relaxed control problems. Under suitable conditions, we establish the relative smoothness and convexity of the control objective with respect to the Bregman divergence of <span><math><mi>h</mi></math></span>, and prove linear convergence of the algorithm for <span><math><mrow><mi>τ</mi><mo>=</mo><mn>0</mn></mrow></math></span> and exponential convergence for <span><math><mrow><mi>τ</mi><mo>></mo><mn>0</mn></mrow></math></span>. The results apply to common regularisers including relative entropy, <span><math><msup><mrow><mi>χ</mi></mrow><mrow><mn>2</mn></mrow></msup></math></span>-divergence, and entropic Wasserstein costs. This validates recent reinforcement learning heuristics that adding regularisation accelerates the convergence of gradient methods. The proof exploits careful regularity estimates of backward stochastic differential equations in the bounded mean oscillation norm.</div></div>","PeriodicalId":51160,"journal":{"name":"Stochastic Processes and their Applications","volume":"190 ","pages":"Article 104765"},"PeriodicalIF":1.2000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Stochastic Processes and their Applications","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0304414925002091","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 0

Abstract

This paper studies the convergence of the mirror descent algorithm for finite horizon stochastic control problems with measure-valued control processes. The control objective involves a convex regularisation function, denoted as

h

, with regularisation strength determined by the weight

τ \geq 0

. The setting covers regularised relaxed control problems. Under suitable conditions, we establish the relative smoothness and convexity of the control objective with respect to the Bregman divergence of

h

, and prove linear convergence of the algorithm for

τ = 0

and exponential convergence for

τ > 0

. The results apply to common regularisers including relative entropy,

χ^{2}

-divergence, and entropic Wasserstein costs. This validates recent reinforcement learning heuristics that adding regularisation accelerates the convergence of gradient methods. The proof exploits careful regularity estimates of backward stochastic differential equations in the bounded mean oscillation norm.

查看原文本刊更多论文

具有测量值控制的随机控制问题的镜像下降

研究了具有测量值控制过程的有限水平随机控制问题的镜像下降算法的收敛性。控制目标涉及一个凸正则化函数，记为h，正则化强度由权值τ≥0决定。该设置涵盖了规则化的放松控制问题。在适当的条件下，我们建立了控制目标相对于h的Bregman散度的相对光滑性和凸性，并证明了算法在τ=0时是线性收敛的，在τ>；0时是指数收敛的。结果适用于常见的正则变量，包括相对熵、χ2-散度和熵Wasserstein成本。这验证了最近的强化学习启发式方法，即添加正则化加速了梯度方法的收敛。该证明利用了有界平均振荡范数中倒向随机微分方程的仔细正则性估计。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Stochastic Processes and their Applications 数学-统计学与概率论

CiteScore

2.90

自引率

7.10%

发文量

180

审稿时长

23.6 weeks

期刊介绍： Stochastic Processes and their Applications publishes papers on the theory and applications of stochastic processes. It is concerned with concepts and techniques, and is oriented towards a broad spectrum of mathematical, scientific and engineering interests. Characterization, structural properties, inference and control of stochastic processes are covered. The journal is exacting and scholarly in its standards. Every effort is made to promote innovation, vitality, and communication between disciplines. All papers are refereed.