基于轨迹导向探索的工业过程控制离线到在线强化学习框架

IF 3.9 2区计算机科学 Q2 AUTOMATION & CONTROL SYSTEMS

Journal of Process Control Pub Date : 2025-08-30 DOI:10.1016/j.jprocont.2025.103535

Jiyang Chen, Na Luo

{"title":"基于轨迹导向探索的工业过程控制离线到在线强化学习框架","authors":"Jiyang Chen, Na Luo","doi":"10.1016/j.jprocont.2025.103535","DOIUrl":null,"url":null,"abstract":"<div><div>Reinforcement learning (RL) in industrial process control faces critical challenges, including limited data availability, unsafe exploration, and the high cost of high-fidelity simulators. These issues limit the practical adoption of RL in process control systems. To address these limitations, this paper presents a comprehensive framework that combines offline pre-training with online finetuning. Specifically, the framework first employs offline RL method to learn conservative policies from historical data, preventing overestimation of unseen actions. It then transitions to fine-tuning using online RL method with a mixed replay buffer that gradually shifts from offline to online data. To further enhance safety during online exploration, this work introduces a trajectory-guided strategy that leverages timestamped sub-optimal expert demonstrations. Rather than replacing agent actions entirely, the proposed method computes a weighted combination of agent and expert actions based on a decaying intervention rate. Both components are designed as modular additions that can be integrated into existing actor-critic algorithms without structural modifications. Case studies on penicillin fermentation and simulated moving bed (SMB) processes demonstrate that the proposed framework outperforms baseline algorithms in terms of learning efficiency, stability, computation costs, and operational safety.</div></div>","PeriodicalId":50079,"journal":{"name":"Journal of Process Control","volume":"154 ","pages":"Article 103535"},"PeriodicalIF":3.9000,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An offline-to-online reinforcement learning framework with trajectory-guided exploration for industrial process control\",\"authors\":\"Jiyang Chen, Na Luo\",\"doi\":\"10.1016/j.jprocont.2025.103535\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Reinforcement learning (RL) in industrial process control faces critical challenges, including limited data availability, unsafe exploration, and the high cost of high-fidelity simulators. These issues limit the practical adoption of RL in process control systems. To address these limitations, this paper presents a comprehensive framework that combines offline pre-training with online finetuning. Specifically, the framework first employs offline RL method to learn conservative policies from historical data, preventing overestimation of unseen actions. It then transitions to fine-tuning using online RL method with a mixed replay buffer that gradually shifts from offline to online data. To further enhance safety during online exploration, this work introduces a trajectory-guided strategy that leverages timestamped sub-optimal expert demonstrations. Rather than replacing agent actions entirely, the proposed method computes a weighted combination of agent and expert actions based on a decaying intervention rate. Both components are designed as modular additions that can be integrated into existing actor-critic algorithms without structural modifications. Case studies on penicillin fermentation and simulated moving bed (SMB) processes demonstrate that the proposed framework outperforms baseline algorithms in terms of learning efficiency, stability, computation costs, and operational safety.</div></div>\",\"PeriodicalId\":50079,\"journal\":{\"name\":\"Journal of Process Control\",\"volume\":\"154 \",\"pages\":\"Article 103535\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Process Control\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0959152425001635\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Process Control","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0959152425001635","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

工业过程控制中的强化学习（RL）面临着严峻的挑战，包括有限的数据可用性、不安全的探索和高保真模拟器的高成本。这些问题限制了RL在过程控制系统中的实际应用。为了解决这些限制，本文提出了一个将离线预训练与在线微调相结合的综合框架。具体而言，该框架首先采用离线强化学习方法从历史数据中学习保守策略，防止对未见动作的高估。然后过渡到使用在线RL方法进行微调，使用混合重播缓冲区逐渐从离线数据转移到在线数据。为了进一步提高在线勘探过程中的安全性，这项工作引入了一种轨迹引导策略，该策略利用了时间戳次优专家演示。该方法不是完全取代代理行为，而是基于衰减的干预率计算代理和专家行为的加权组合。这两个组件都被设计为模块化的附加组件，可以集成到现有的演员批评算法中，而无需进行结构修改。青霉素发酵和模拟移动床（SMB）过程的案例研究表明，所提出的框架在学习效率、稳定性、计算成本和操作安全性方面优于基线算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An offline-to-online reinforcement learning framework with trajectory-guided exploration for industrial process control

Reinforcement learning (RL) in industrial process control faces critical challenges, including limited data availability, unsafe exploration, and the high cost of high-fidelity simulators. These issues limit the practical adoption of RL in process control systems. To address these limitations, this paper presents a comprehensive framework that combines offline pre-training with online finetuning. Specifically, the framework first employs offline RL method to learn conservative policies from historical data, preventing overestimation of unseen actions. It then transitions to fine-tuning using online RL method with a mixed replay buffer that gradually shifts from offline to online data. To further enhance safety during online exploration, this work introduces a trajectory-guided strategy that leverages timestamped sub-optimal expert demonstrations. Rather than replacing agent actions entirely, the proposed method computes a weighted combination of agent and expert actions based on a decaying intervention rate. Both components are designed as modular additions that can be integrated into existing actor-critic algorithms without structural modifications. Case studies on penicillin fermentation and simulated moving bed (SMB) processes demonstrate that the proposed framework outperforms baseline algorithms in terms of learning efficiency, stability, computation costs, and operational safety.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Process Control 工程技术-工程：化工

CiteScore

7.00

自引率

11.90%

发文量

159

审稿时长

74 days

期刊介绍： This international journal covers the application of control theory, operations research, computer science and engineering principles to the solution of process control problems. In addition to the traditional chemical processing and manufacturing applications, the scope of process control problems involves a wide range of applications that includes energy processes, nano-technology, systems biology, bio-medical engineering, pharmaceutical processing technology, energy storage and conversion, smart grid, and data analytics among others. Papers on the theory in these areas will also be accepted provided the theoretical contribution is aimed at the application and the development of process control techniques. Topics covered include: • Control applications• Process monitoring• Plant-wide control• Process control systems• Control techniques and algorithms• Process modelling and simulation• Design methods Advanced design methods exclude well established and widely studied traditional design techniques such as PID tuning and its many variants. Applications in fields such as control of automotive engines, machinery and robotics are not deemed suitable unless a clear motivation for the relevance to process control is provided.