A Neural Network Approach for Stochastic Optimal Control

IF 2.6 2区数学 Q1 MATHEMATICS, APPLIED

SIAM Journal on Scientific Computing Pub Date : 2024-09-03 DOI:10.1137/23m155832x

Xingjian Li, Deepanshu Verma, Lars Ruthotto

{"title":"A Neural Network Approach for Stochastic Optimal Control","authors":"Xingjian Li, Deepanshu Verma, Lars Ruthotto","doi":"10.1137/23m155832x","DOIUrl":null,"url":null,"abstract":"SIAM Journal on Scientific Computing, Volume 46, Issue 5, Page C535-C556, October 2024. <br/> Abstract. We present a neural network approach for approximating the value function of high-dimensional stochastic control problems. Our training process simultaneously updates our value function estimate and identifies the part of the state space likely to be visited by optimal trajectories. Our approach leverages insights from optimal control theory and the fundamental relation between semilinear parabolic partial differential equations and forward-backward stochastic differential equations. To focus the sampling on relevant states during neural network training, we use the stochastic Pontryagin maximum principle (PMP) to obtain the optimal controls for the current value function estimate. By design, our approach coincides with the method of characteristics for the nonviscous Hamilton–Jacobi–Bellman equation arising in deterministic control problems. Our training loss consists of a weighted sum of the objective functional of the control problem and penalty terms that enforce the HJB equations along the sampled trajectories. Importantly, training is unsupervised in that it does not require solutions of the control problem. Our numerical experiments highlight our scheme’s ability to identify the relevant parts of the state space and produce meaningful value estimates. Using a two-dimensional model problem, we demonstrate the importance of the stochastic PMP to inform the sampling and compare it to a finite element approach. With a nonlinear control affine quadcopter example, we illustrate that our approach can handle complicated dynamics. For a 100-dimensional benchmark problem, we demonstrate that our approach improves accuracy and time-to-solution, and, via a modification, we show the wider applicability of our scheme. Reproducibility of computational results.This paper has been awarded the “SIAM Reproducibility Badge: Code and data available” as recognition that the authors have followed reproducibility principles valued by SISC and the scientific computing community. Code and data that allow readers to reproduce the results in this paper are available at https://github.com/EmoryMLIP/NeuralSOC and in the supplementary material (NeuralSOC-main.zip [ 29.9MB]).","PeriodicalId":49526,"journal":{"name":"SIAM Journal on Scientific Computing","volume":"26 1","pages":""},"PeriodicalIF":2.6000,"publicationDate":"2024-09-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SIAM Journal on Scientific Computing","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1137/23m155832x","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}

引用次数: 0

Abstract

SIAM Journal on Scientific Computing, Volume 46, Issue 5, Page C535-C556, October 2024.
Abstract. We present a neural network approach for approximating the value function of high-dimensional stochastic control problems. Our training process simultaneously updates our value function estimate and identifies the part of the state space likely to be visited by optimal trajectories. Our approach leverages insights from optimal control theory and the fundamental relation between semilinear parabolic partial differential equations and forward-backward stochastic differential equations. To focus the sampling on relevant states during neural network training, we use the stochastic Pontryagin maximum principle (PMP) to obtain the optimal controls for the current value function estimate. By design, our approach coincides with the method of characteristics for the nonviscous Hamilton–Jacobi–Bellman equation arising in deterministic control problems. Our training loss consists of a weighted sum of the objective functional of the control problem and penalty terms that enforce the HJB equations along the sampled trajectories. Importantly, training is unsupervised in that it does not require solutions of the control problem. Our numerical experiments highlight our scheme’s ability to identify the relevant parts of the state space and produce meaningful value estimates. Using a two-dimensional model problem, we demonstrate the importance of the stochastic PMP to inform the sampling and compare it to a finite element approach. With a nonlinear control affine quadcopter example, we illustrate that our approach can handle complicated dynamics. For a 100-dimensional benchmark problem, we demonstrate that our approach improves accuracy and time-to-solution, and, via a modification, we show the wider applicability of our scheme. Reproducibility of computational results.This paper has been awarded the “SIAM Reproducibility Badge: Code and data available” as recognition that the authors have followed reproducibility principles valued by SISC and the scientific computing community. Code and data that allow readers to reproduce the results in this paper are available at https://github.com/EmoryMLIP/NeuralSOC and in the supplementary material (NeuralSOC-main.zip [ 29.9MB]).

查看原文本刊更多论文

随机优化控制的神经网络方法

SIAM 科学计算期刊》，第 46 卷第 5 期，第 C535-C556 页，2024 年 10 月。摘要我们提出了一种近似高维随机控制问题价值函数的神经网络方法。我们的训练过程可同时更新我们的价值函数估计值，并确定最优轨迹可能访问的状态空间部分。我们的方法充分利用了最优控制理论以及半线性抛物线偏微分方程和前向后向随机微分方程之间的基本关系。为了在神经网络训练期间将采样重点放在相关状态上，我们使用随机庞特里亚金最大原则（PMP）来获得当前价值函数估计的最优控制。通过设计，我们的方法与确定性控制问题中出现的非粘性汉密尔顿-雅各比-贝尔曼方程的特征方法不谋而合。我们的训练损失由控制问题目标函数的加权和以及沿采样轨迹强制执行 HJB 方程的惩罚项组成。重要的是，训练是无监督的，因为它不需要控制问题的解决方案。我们的数值实验突出表明，我们的方案能够识别状态空间的相关部分，并产生有意义的值估计。通过一个二维模型问题，我们证明了随机 PMP 对采样的重要性，并将其与有限元方法进行了比较。通过一个非线性控制仿真四旋翼飞行器的例子，我们说明了我们的方法可以处理复杂的动力学问题。对于一个 100 维的基准问题，我们证明了我们的方法提高了准确性并缩短了求解时间，而且通过修改，我们展示了我们方案更广泛的适用性。计算结果的可重复性：本文被授予 "SIAM 可重复性徽章"：代码和数据可用"，以表彰作者遵循了 SISC 和科学计算界所珍视的可重现性原则。读者可以通过 https://github.com/EmoryMLIP/NeuralSOC 和补充材料（NeuralSOC-main.zip [ 29.9MB]）中的代码和数据重现本文的结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

SIAM Journal on Scientific Computing 数学-应用数学

CiteScore

5.50

自引率

3.20%

发文量

209

审稿时长

1 months

期刊介绍： The purpose of SIAM Journal on Scientific Computing (SISC) is to advance computational methods for solving scientific and engineering problems. SISC papers are classified into three categories: 1. Methods and Algorithms for Scientific Computing: Papers in this category may include theoretical analysis, provided that the relevance to applications in science and engineering is demonstrated. They should contain meaningful computational results and theoretical results or strong heuristics supporting the performance of new algorithms. 2. Computational Methods in Science and Engineering: Papers in this section will typically describe novel methodologies for solving a specific problem in computational science or engineering. They should contain enough information about the application to orient other computational scientists but should omit details of interest mainly to the applications specialist. 3. Software and High-Performance Computing: Papers in this category should concern the novel design and development of computational methods and high-quality software, parallel algorithms, high-performance computing issues, new architectures, data analysis, or visualization. The primary focus should be on computational methods that have potentially large impact for an important class of scientific or engineering problems.