A Neural Network Approach for Stochastic Optimal Control

IF 4.3 3区 材料科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Xingjian Li, Deepanshu Verma, Lars Ruthotto
{"title":"A Neural Network Approach for Stochastic Optimal Control","authors":"Xingjian Li, Deepanshu Verma, Lars Ruthotto","doi":"10.1137/23m155832x","DOIUrl":null,"url":null,"abstract":"SIAM Journal on Scientific Computing, Volume 46, Issue 5, Page C535-C556, October 2024. <br/> Abstract. We present a neural network approach for approximating the value function of high-dimensional stochastic control problems. Our training process simultaneously updates our value function estimate and identifies the part of the state space likely to be visited by optimal trajectories. Our approach leverages insights from optimal control theory and the fundamental relation between semilinear parabolic partial differential equations and forward-backward stochastic differential equations. To focus the sampling on relevant states during neural network training, we use the stochastic Pontryagin maximum principle (PMP) to obtain the optimal controls for the current value function estimate. By design, our approach coincides with the method of characteristics for the nonviscous Hamilton–Jacobi–Bellman equation arising in deterministic control problems. Our training loss consists of a weighted sum of the objective functional of the control problem and penalty terms that enforce the HJB equations along the sampled trajectories. Importantly, training is unsupervised in that it does not require solutions of the control problem. Our numerical experiments highlight our scheme’s ability to identify the relevant parts of the state space and produce meaningful value estimates. Using a two-dimensional model problem, we demonstrate the importance of the stochastic PMP to inform the sampling and compare it to a finite element approach. With a nonlinear control affine quadcopter example, we illustrate that our approach can handle complicated dynamics. For a 100-dimensional benchmark problem, we demonstrate that our approach improves accuracy and time-to-solution, and, via a modification, we show the wider applicability of our scheme. Reproducibility of computational results.This paper has been awarded the “SIAM Reproducibility Badge: Code and data available” as recognition that the authors have followed reproducibility principles valued by SISC and the scientific computing community. Code and data that allow readers to reproduce the results in this paper are available at https://github.com/EmoryMLIP/NeuralSOC and in the supplementary material (NeuralSOC-main.zip [ 29.9MB]).","PeriodicalId":3,"journal":{"name":"ACS Applied Electronic Materials","volume":null,"pages":null},"PeriodicalIF":4.3000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Electronic Materials","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1137/23m155832x","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

SIAM Journal on Scientific Computing, Volume 46, Issue 5, Page C535-C556, October 2024.
Abstract. We present a neural network approach for approximating the value function of high-dimensional stochastic control problems. Our training process simultaneously updates our value function estimate and identifies the part of the state space likely to be visited by optimal trajectories. Our approach leverages insights from optimal control theory and the fundamental relation between semilinear parabolic partial differential equations and forward-backward stochastic differential equations. To focus the sampling on relevant states during neural network training, we use the stochastic Pontryagin maximum principle (PMP) to obtain the optimal controls for the current value function estimate. By design, our approach coincides with the method of characteristics for the nonviscous Hamilton–Jacobi–Bellman equation arising in deterministic control problems. Our training loss consists of a weighted sum of the objective functional of the control problem and penalty terms that enforce the HJB equations along the sampled trajectories. Importantly, training is unsupervised in that it does not require solutions of the control problem. Our numerical experiments highlight our scheme’s ability to identify the relevant parts of the state space and produce meaningful value estimates. Using a two-dimensional model problem, we demonstrate the importance of the stochastic PMP to inform the sampling and compare it to a finite element approach. With a nonlinear control affine quadcopter example, we illustrate that our approach can handle complicated dynamics. For a 100-dimensional benchmark problem, we demonstrate that our approach improves accuracy and time-to-solution, and, via a modification, we show the wider applicability of our scheme. Reproducibility of computational results.This paper has been awarded the “SIAM Reproducibility Badge: Code and data available” as recognition that the authors have followed reproducibility principles valued by SISC and the scientific computing community. Code and data that allow readers to reproduce the results in this paper are available at https://github.com/EmoryMLIP/NeuralSOC and in the supplementary material (NeuralSOC-main.zip [ 29.9MB]).
随机优化控制的神经网络方法
SIAM 科学计算期刊》,第 46 卷第 5 期,第 C535-C556 页,2024 年 10 月。 摘要我们提出了一种近似高维随机控制问题价值函数的神经网络方法。我们的训练过程可同时更新我们的价值函数估计值,并确定最优轨迹可能访问的状态空间部分。我们的方法充分利用了最优控制理论以及半线性抛物线偏微分方程和前向后向随机微分方程之间的基本关系。为了在神经网络训练期间将采样重点放在相关状态上,我们使用随机庞特里亚金最大原则(PMP)来获得当前价值函数估计的最优控制。通过设计,我们的方法与确定性控制问题中出现的非粘性汉密尔顿-雅各比-贝尔曼方程的特征方法不谋而合。我们的训练损失由控制问题目标函数的加权和以及沿采样轨迹强制执行 HJB 方程的惩罚项组成。重要的是,训练是无监督的,因为它不需要控制问题的解决方案。我们的数值实验突出表明,我们的方案能够识别状态空间的相关部分,并产生有意义的值估计。通过一个二维模型问题,我们证明了随机 PMP 对采样的重要性,并将其与有限元方法进行了比较。通过一个非线性控制仿真四旋翼飞行器的例子,我们说明了我们的方法可以处理复杂的动力学问题。对于一个 100 维的基准问题,我们证明了我们的方法提高了准确性并缩短了求解时间,而且通过修改,我们展示了我们方案更广泛的适用性。计算结果的可重复性:本文被授予 "SIAM 可重复性徽章":代码和数据可用",以表彰作者遵循了 SISC 和科学计算界所珍视的可重现性原则。读者可以通过 https://github.com/EmoryMLIP/NeuralSOC 和补充材料(NeuralSOC-main.zip [ 29.9MB])中的代码和数据重现本文的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
7.20
自引率
4.30%
发文量
567
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信