Safe, visualizable reinforcement learning for process control with a warm-started actor network based on PI-control

IF 3.3 2区计算机科学 Q2 AUTOMATION & CONTROL SYSTEMS

Journal of Process Control Pub Date : 2024-11-16 DOI:10.1016/j.jprocont.2024.103340

Edward H. Bras, Tobias M. Louw, Steven M. Bradshaw

{"title":"Safe, visualizable reinforcement learning for process control with a warm-started actor network based on PI-control","authors":"Edward H. Bras, Tobias M. Louw, Steven M. Bradshaw","doi":"10.1016/j.jprocont.2024.103340","DOIUrl":null,"url":null,"abstract":"<div><div>The adoption of reinforcement learning (RL) in chemical process industries is currently hindered by the use of black-box models that cannot be easily visualized or interpreted as well as the challenge of balancing safe control with exploration. Clearly illustrating the similarities between classical control- and RL theory, as well as demonstrating the possibility of maintaining process safety under RL-based control, will go a long way towards bridging the gap between academic research and industry practice. In this work, a simple approach to the dynamic online adaptation of a non-linear control policy initialised using PI control through RL is introduced. The familiar PI controller is represented as a plane in the state-action space, where the states comprise the error and integral error, and the action is the control input. The plane was recreated using a neural network and this recreated plane served as a readily visualizable initial “warm-started” policy for the RL agent. The actor-critic algorithm was applied to adapt the policy non-linearly during interaction with the controlled process, thereby leveraging the flexibility of the neural network to improve performance. Inherently safe control during training is ensured by introducing a soft active region component in the actor neural network. Finally, the use of cold connections is proposed whereby the state space can be augmented at any stage of training (e.g., through the incorporation of measurements to facilitate feedforward control) while fully preserving the agent’s training progress to date. By ensuring controller safety, the proposed methods are applicable to the dynamic adaptation of any process where stable PI control is feasible at nominal initial conditions.</div></div>","PeriodicalId":50079,"journal":{"name":"Journal of Process Control","volume":"144 ","pages":"Article 103340"},"PeriodicalIF":3.3000,"publicationDate":"2024-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Process Control","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S095915242400180X","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

The adoption of reinforcement learning (RL) in chemical process industries is currently hindered by the use of black-box models that cannot be easily visualized or interpreted as well as the challenge of balancing safe control with exploration. Clearly illustrating the similarities between classical control- and RL theory, as well as demonstrating the possibility of maintaining process safety under RL-based control, will go a long way towards bridging the gap between academic research and industry practice. In this work, a simple approach to the dynamic online adaptation of a non-linear control policy initialised using PI control through RL is introduced. The familiar PI controller is represented as a plane in the state-action space, where the states comprise the error and integral error, and the action is the control input. The plane was recreated using a neural network and this recreated plane served as a readily visualizable initial “warm-started” policy for the RL agent. The actor-critic algorithm was applied to adapt the policy non-linearly during interaction with the controlled process, thereby leveraging the flexibility of the neural network to improve performance. Inherently safe control during training is ensured by introducing a soft active region component in the actor neural network. Finally, the use of cold connections is proposed whereby the state space can be augmented at any stage of training (e.g., through the incorporation of measurements to facilitate feedforward control) while fully preserving the agent’s training progress to date. By ensuring controller safety, the proposed methods are applicable to the dynamic adaptation of any process where stable PI control is feasible at nominal initial conditions.

查看原文本刊更多论文

利用基于 PI 控制的暖启动行为网络，为过程控制提供安全、可视化的强化学习

强化学习（RL）目前在化工流程工业中的应用受到以下因素的阻碍：黑盒模型的使用不便于可视化或解释，以及在安全控制与探索之间取得平衡所面临的挑战。清楚地说明经典控制理论与 RL 理论之间的相似性，并证明在基于 RL 的控制下保持过程安全的可能性，将大大有助于缩小学术研究与行业实践之间的差距。在这项工作中，介绍了一种通过 RL 对使用 PI 控制初始化的非线性控制策略进行动态在线调整的简单方法。我们熟悉的 PI 控制器被表示为状态-动作空间中的一个平面，其中状态包括误差和积分误差，而动作则是控制输入。使用神经网络重新创建了该平面，并将该重新创建的平面作为 RL 代理可视化的初始 "热启动 "策略。在与受控过程交互的过程中，采用行为批判算法对策略进行非线性调整，从而利用神经网络的灵活性提高性能。通过在行动者神经网络中引入软活动区域组件，确保了训练期间的固有安全控制。最后，还提出了使用冷连接的方法，这样就可以在训练的任何阶段对状态空间进行扩展（例如，通过纳入测量数据来促进前馈控制），同时完全保留代理到目前为止的训练进度。通过确保控制器的安全性，所提出的方法适用于在标称初始条件下可进行稳定 PI 控制的任何过程的动态适应。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Process Control 工程技术-工程：化工

CiteScore

7.00

自引率

11.90%

发文量

159

审稿时长

74 days

期刊介绍： This international journal covers the application of control theory, operations research, computer science and engineering principles to the solution of process control problems. In addition to the traditional chemical processing and manufacturing applications, the scope of process control problems involves a wide range of applications that includes energy processes, nano-technology, systems biology, bio-medical engineering, pharmaceutical processing technology, energy storage and conversion, smart grid, and data analytics among others. Papers on the theory in these areas will also be accepted provided the theoretical contribution is aimed at the application and the development of process control techniques. Topics covered include: • Control applications• Process monitoring• Plant-wide control• Process control systems• Control techniques and algorithms• Process modelling and simulation• Design methods Advanced design methods exclude well established and widely studied traditional design techniques such as PID tuning and its many variants. Applications in fields such as control of automotive engines, machinery and robotics are not deemed suitable unless a clear motivation for the relevance to process control is provided.