具有输入输出约束的非线性系统的实用强化学习控制设计

IF 3.9 2区工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers & Chemical Engineering Pub Date : 2025-07-03 DOI:10.1016/j.compchemeng.2025.109248

Hesam Hassanpour , Brandon Corbett , Prashant Mhaskar

{"title":"具有输入输出约束的非线性系统的实用强化学习控制设计","authors":"Hesam Hassanpour , Brandon Corbett , Prashant Mhaskar","doi":"10.1016/j.compchemeng.2025.109248","DOIUrl":null,"url":null,"abstract":"<div><div>In this work, a practically implementable reinforcement learning (RL)-based controller is designed to handle process input and output constraints. In a typical RL problem, an RL agent is employed to learn an optimal control policy through interactions with the environment. This is unimplementable in practical situations due to the excessive exploration needed by the RL-based controller and exacerbated by the possible violation of the input and output constraints. We previously proposed an implementable RL controller that can circumvent random exploration needs by leveraging existing model predictive control (MPC) to pre-train/warm start the RL agent. The pre-trained agent is subsequently employed in real-time to engage with the process to improve its performance by gaining more knowledge about the nonlinear behavior of the system. This work generalizes our previous method to handle constraints on the outputs and the rate of change of the inputs by modifying the reward function. The effectiveness of the proposed algorithm is illustrated through simulations conducted for control of a pH neutralization process. The findings indicate that the proposed RL method enhances closed-loop performance in comparison to the nominal MPC while satisfying all input and output constraints.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"201 ","pages":"Article 109248"},"PeriodicalIF":3.9000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A practical reinforcement learning control design for nonlinear systems with input and output constraints\",\"authors\":\"Hesam Hassanpour , Brandon Corbett , Prashant Mhaskar\",\"doi\":\"10.1016/j.compchemeng.2025.109248\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In this work, a practically implementable reinforcement learning (RL)-based controller is designed to handle process input and output constraints. In a typical RL problem, an RL agent is employed to learn an optimal control policy through interactions with the environment. This is unimplementable in practical situations due to the excessive exploration needed by the RL-based controller and exacerbated by the possible violation of the input and output constraints. We previously proposed an implementable RL controller that can circumvent random exploration needs by leveraging existing model predictive control (MPC) to pre-train/warm start the RL agent. The pre-trained agent is subsequently employed in real-time to engage with the process to improve its performance by gaining more knowledge about the nonlinear behavior of the system. This work generalizes our previous method to handle constraints on the outputs and the rate of change of the inputs by modifying the reward function. The effectiveness of the proposed algorithm is illustrated through simulations conducted for control of a pH neutralization process. The findings indicate that the proposed RL method enhances closed-loop performance in comparison to the nominal MPC while satisfying all input and output constraints.</div></div>\",\"PeriodicalId\":286,\"journal\":{\"name\":\"Computers & Chemical Engineering\",\"volume\":\"201 \",\"pages\":\"Article 109248\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Chemical Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0098135425002522\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425002522","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

在这项工作中，设计了一个实际可实现的基于强化学习（RL）的控制器来处理过程输入和输出约束。在典型的RL问题中，RL代理通过与环境的交互来学习最优控制策略。这在实际情况下是无法实现的，因为基于rl的控制器需要过度探索，并且可能违反输入和输出约束。我们之前提出了一个可实现的RL控制器，它可以通过利用现有的模型预测控制（MPC）来预训练/预热启动RL代理，从而规避随机探索需求。预训练的智能体随后被实时地应用于过程中，通过获得更多关于系统非线性行为的知识来提高其性能。这项工作推广了我们之前的方法，通过修改奖励函数来处理输出约束和输入变化率。通过对pH中和过程控制的仿真，说明了该算法的有效性。研究结果表明，与标称MPC相比，所提出的RL方法在满足所有输入和输出约束的同时提高了闭环性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A practical reinforcement learning control design for nonlinear systems with input and output constraints

In this work, a practically implementable reinforcement learning (RL)-based controller is designed to handle process input and output constraints. In a typical RL problem, an RL agent is employed to learn an optimal control policy through interactions with the environment. This is unimplementable in practical situations due to the excessive exploration needed by the RL-based controller and exacerbated by the possible violation of the input and output constraints. We previously proposed an implementable RL controller that can circumvent random exploration needs by leveraging existing model predictive control (MPC) to pre-train/warm start the RL agent. The pre-trained agent is subsequently employed in real-time to engage with the process to improve its performance by gaining more knowledge about the nonlinear behavior of the system. This work generalizes our previous method to handle constraints on the outputs and the rate of change of the inputs by modifying the reward function. The effectiveness of the proposed algorithm is illustrated through simulations conducted for control of a pH neutralization process. The findings indicate that the proposed RL method enhances closed-loop performance in comparison to the nominal MPC while satisfying all input and output constraints.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Computers & Chemical Engineering 工程技术-工程：化工

CiteScore

8.70

自引率

14.00%

发文量

374

审稿时长

70 days

期刊介绍： Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.