Continuous learning of the value function utilizing deep reinforcement learning to be used as the objective in model predictive control

IF 3.9 2区 工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Daniel Beahr, Elijah Hedrick, Debangsu Bhattacharyya
{"title":"Continuous learning of the value function utilizing deep reinforcement learning to be used as the objective in model predictive control","authors":"Daniel Beahr,&nbsp;Elijah Hedrick,&nbsp;Debangsu Bhattacharyya","doi":"10.1016/j.compchemeng.2025.109262","DOIUrl":null,"url":null,"abstract":"<div><div>Reinforcement learning (RL) and model predictive control (MPC) possess an inherent synergy in the manner in which they function. The work presented here investigates integrating RL with existing MPC in order to provide a constrained policy for the RL while also creating an adaptable objective for the MPC. Selection of MPC for combination with RL is not arbitrary. Two specific aspects of MPC are advantageous for such a combination: the use of a value function and the use of a model. The use of a model in MPC is useful since, by solving for the optimal trajectory, a projected view of the expected reward is gained. While this information can be inaccurate based on the current value function, it can allow for accelerated learning. By combining this with a correction for state transitions, an MPC formulation is derived that obeys the constraints set forth, but can adapt to changing dynamics and correct for plant model mismatch without a required discrete update, an advantage over standard MPC formulations. We propose two algorithms for the proposed value-function model predictive controller (VFMPC): one denoted as VFMPC(0) where the one step return is utilized to learn the cost function, and the other denoted as VFMPC(n), where the optimal trajectory is used to learn the n-step return subject to the dynamics of the process model. An artificial network (ANN) model is introduced into VFMPC(n) to improve the controller performance under slowly changing dynamics and plant-model mismatch, called VFMPC(<span><math><msub><mrow><mi>N</mi></mrow><mrow><mi>P</mi></mrow></msub></math></span>). The developed algorithms are applied to two applications.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"201 ","pages":"Article 109262"},"PeriodicalIF":3.9000,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425002662","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

Reinforcement learning (RL) and model predictive control (MPC) possess an inherent synergy in the manner in which they function. The work presented here investigates integrating RL with existing MPC in order to provide a constrained policy for the RL while also creating an adaptable objective for the MPC. Selection of MPC for combination with RL is not arbitrary. Two specific aspects of MPC are advantageous for such a combination: the use of a value function and the use of a model. The use of a model in MPC is useful since, by solving for the optimal trajectory, a projected view of the expected reward is gained. While this information can be inaccurate based on the current value function, it can allow for accelerated learning. By combining this with a correction for state transitions, an MPC formulation is derived that obeys the constraints set forth, but can adapt to changing dynamics and correct for plant model mismatch without a required discrete update, an advantage over standard MPC formulations. We propose two algorithms for the proposed value-function model predictive controller (VFMPC): one denoted as VFMPC(0) where the one step return is utilized to learn the cost function, and the other denoted as VFMPC(n), where the optimal trajectory is used to learn the n-step return subject to the dynamics of the process model. An artificial network (ANN) model is introduced into VFMPC(n) to improve the controller performance under slowly changing dynamics and plant-model mismatch, called VFMPC(NP). The developed algorithms are applied to two applications.
利用深度强化学习对值函数进行持续学习,作为模型预测控制的目标
强化学习(RL)和模型预测控制(MPC)在它们的作用方式上具有内在的协同作用。本文研究了将RL与现有MPC集成,以便为RL提供约束策略,同时为MPC创建适应性目标。MPC与RL组合的选择不是任意的。MPC的两个具体方面对这种组合是有利的:使用价值函数和使用模型。在MPC中使用模型是有用的,因为通过求解最优轨迹,可以获得预期回报的投影视图。虽然基于当前的价值函数,这些信息可能是不准确的,但它可以允许加速学习。通过将其与状态转换的校正相结合,可以推导出遵循上述约束的MPC公式,但可以适应不断变化的动态,并在不需要离散更新的情况下纠正工厂模型不匹配,这是标准MPC公式的优势。我们为所提出的值函数模型预测控制器(VFMPC)提出了两种算法:一种表示为VFMPC(0),其中利用一步返回来学习成本函数,另一种表示为VFMPC(n),其中使用最优轨迹来学习n步返回服从过程模型的动力学。在VFMPC(n)中引入人工网络(ANN)模型,以改善控制器在动态缓慢变化和植物模型不匹配情况下的性能,称为VFMPC(NP)。所开发的算法应用于两种应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers & Chemical Engineering
Computers & Chemical Engineering 工程技术-工程:化工
CiteScore
8.70
自引率
14.00%
发文量
374
审稿时长
70 days
期刊介绍: Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信