Continuous learning of the value function utilizing deep reinforcement learning to be used as the objective in model predictive control

IF 3.9 2区工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers & Chemical Engineering Pub Date : 2025-07-05 DOI:10.1016/j.compchemeng.2025.109262

Daniel Beahr, Elijah Hedrick, Debangsu Bhattacharyya

{"title":"Continuous learning of the value function utilizing deep reinforcement learning to be used as the objective in model predictive control","authors":"Daniel Beahr, Elijah Hedrick, Debangsu Bhattacharyya","doi":"10.1016/j.compchemeng.2025.109262","DOIUrl":null,"url":null,"abstract":"<div><div>Reinforcement learning (RL) and model predictive control (MPC) possess an inherent synergy in the manner in which they function. The work presented here investigates integrating RL with existing MPC in order to provide a constrained policy for the RL while also creating an adaptable objective for the MPC. Selection of MPC for combination with RL is not arbitrary. Two specific aspects of MPC are advantageous for such a combination: the use of a value function and the use of a model. The use of a model in MPC is useful since, by solving for the optimal trajectory, a projected view of the expected reward is gained. While this information can be inaccurate based on the current value function, it can allow for accelerated learning. By combining this with a correction for state transitions, an MPC formulation is derived that obeys the constraints set forth, but can adapt to changing dynamics and correct for plant model mismatch without a required discrete update, an advantage over standard MPC formulations. We propose two algorithms for the proposed value-function model predictive controller (VFMPC): one denoted as VFMPC(0) where the one step return is utilized to learn the cost function, and the other denoted as VFMPC(n), where the optimal trajectory is used to learn the n-step return subject to the dynamics of the process model. An artificial network (ANN) model is introduced into VFMPC(n) to improve the controller performance under slowly changing dynamics and plant-model mismatch, called VFMPC(<span><math><msub><mrow><mi>N</mi></mrow><mrow><mi>P</mi></mrow></msub></math></span>). The developed algorithms are applied to two applications.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"201 ","pages":"Article 109262"},"PeriodicalIF":3.9000,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425002662","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

Reinforcement learning (RL) and model predictive control (MPC) possess an inherent synergy in the manner in which they function. The work presented here investigates integrating RL with existing MPC in order to provide a constrained policy for the RL while also creating an adaptable objective for the MPC. Selection of MPC for combination with RL is not arbitrary. Two specific aspects of MPC are advantageous for such a combination: the use of a value function and the use of a model. The use of a model in MPC is useful since, by solving for the optimal trajectory, a projected view of the expected reward is gained. While this information can be inaccurate based on the current value function, it can allow for accelerated learning. By combining this with a correction for state transitions, an MPC formulation is derived that obeys the constraints set forth, but can adapt to changing dynamics and correct for plant model mismatch without a required discrete update, an advantage over standard MPC formulations. We propose two algorithms for the proposed value-function model predictive controller (VFMPC): one denoted as VFMPC(0) where the one step return is utilized to learn the cost function, and the other denoted as VFMPC(n), where the optimal trajectory is used to learn the n-step return subject to the dynamics of the process model. An artificial network (ANN) model is introduced into VFMPC(n) to improve the controller performance under slowly changing dynamics and plant-model mismatch, called VFMPC(

N_{P}

). The developed algorithms are applied to two applications.

查看原文本刊更多论文

利用深度强化学习对值函数进行持续学习，作为模型预测控制的目标

强化学习（RL）和模型预测控制（MPC）在它们的作用方式上具有内在的协同作用。本文研究了将RL与现有MPC集成，以便为RL提供约束策略，同时为MPC创建适应性目标。MPC与RL组合的选择不是任意的。MPC的两个具体方面对这种组合是有利的：使用价值函数和使用模型。在MPC中使用模型是有用的，因为通过求解最优轨迹，可以获得预期回报的投影视图。虽然基于当前的价值函数，这些信息可能是不准确的，但它可以允许加速学习。通过将其与状态转换的校正相结合，可以推导出遵循上述约束的MPC公式，但可以适应不断变化的动态，并在不需要离散更新的情况下纠正工厂模型不匹配，这是标准MPC公式的优势。我们为所提出的值函数模型预测控制器（VFMPC）提出了两种算法：一种表示为VFMPC(0)，其中利用一步返回来学习成本函数，另一种表示为VFMPC(n)，其中使用最优轨迹来学习n步返回服从过程模型的动力学。在VFMPC(n)中引入人工网络（ANN）模型，以改善控制器在动态缓慢变化和植物模型不匹配情况下的性能，称为VFMPC（NP）。所开发的算法应用于两种应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Chemical Engineering 工程技术-工程：化工

CiteScore

8.70

自引率

14.00%

发文量

374

审稿时长

70 days

期刊介绍： Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.