Daniel Beahr, Elijah Hedrick, Debangsu Bhattacharyya
{"title":"Continuous learning of the value function utilizing deep reinforcement learning to be used as the objective in model predictive control","authors":"Daniel Beahr, Elijah Hedrick, Debangsu Bhattacharyya","doi":"10.1016/j.compchemeng.2025.109262","DOIUrl":null,"url":null,"abstract":"<div><div>Reinforcement learning (RL) and model predictive control (MPC) possess an inherent synergy in the manner in which they function. The work presented here investigates integrating RL with existing MPC in order to provide a constrained policy for the RL while also creating an adaptable objective for the MPC. Selection of MPC for combination with RL is not arbitrary. Two specific aspects of MPC are advantageous for such a combination: the use of a value function and the use of a model. The use of a model in MPC is useful since, by solving for the optimal trajectory, a projected view of the expected reward is gained. While this information can be inaccurate based on the current value function, it can allow for accelerated learning. By combining this with a correction for state transitions, an MPC formulation is derived that obeys the constraints set forth, but can adapt to changing dynamics and correct for plant model mismatch without a required discrete update, an advantage over standard MPC formulations. We propose two algorithms for the proposed value-function model predictive controller (VFMPC): one denoted as VFMPC(0) where the one step return is utilized to learn the cost function, and the other denoted as VFMPC(n), where the optimal trajectory is used to learn the n-step return subject to the dynamics of the process model. An artificial network (ANN) model is introduced into VFMPC(n) to improve the controller performance under slowly changing dynamics and plant-model mismatch, called VFMPC(<span><math><msub><mrow><mi>N</mi></mrow><mrow><mi>P</mi></mrow></msub></math></span>). The developed algorithms are applied to two applications.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"201 ","pages":"Article 109262"},"PeriodicalIF":3.9000,"publicationDate":"2025-07-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425002662","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
Reinforcement learning (RL) and model predictive control (MPC) possess an inherent synergy in the manner in which they function. The work presented here investigates integrating RL with existing MPC in order to provide a constrained policy for the RL while also creating an adaptable objective for the MPC. Selection of MPC for combination with RL is not arbitrary. Two specific aspects of MPC are advantageous for such a combination: the use of a value function and the use of a model. The use of a model in MPC is useful since, by solving for the optimal trajectory, a projected view of the expected reward is gained. While this information can be inaccurate based on the current value function, it can allow for accelerated learning. By combining this with a correction for state transitions, an MPC formulation is derived that obeys the constraints set forth, but can adapt to changing dynamics and correct for plant model mismatch without a required discrete update, an advantage over standard MPC formulations. We propose two algorithms for the proposed value-function model predictive controller (VFMPC): one denoted as VFMPC(0) where the one step return is utilized to learn the cost function, and the other denoted as VFMPC(n), where the optimal trajectory is used to learn the n-step return subject to the dynamics of the process model. An artificial network (ANN) model is introduced into VFMPC(n) to improve the controller performance under slowly changing dynamics and plant-model mismatch, called VFMPC(). The developed algorithms are applied to two applications.
期刊介绍:
Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.