{"title":"基于轨迹导向探索的工业过程控制离线到在线强化学习框架","authors":"Jiyang Chen, Na Luo","doi":"10.1016/j.jprocont.2025.103535","DOIUrl":null,"url":null,"abstract":"<div><div>Reinforcement learning (RL) in industrial process control faces critical challenges, including limited data availability, unsafe exploration, and the high cost of high-fidelity simulators. These issues limit the practical adoption of RL in process control systems. To address these limitations, this paper presents a comprehensive framework that combines offline pre-training with online finetuning. Specifically, the framework first employs offline RL method to learn conservative policies from historical data, preventing overestimation of unseen actions. It then transitions to fine-tuning using online RL method with a mixed replay buffer that gradually shifts from offline to online data. To further enhance safety during online exploration, this work introduces a trajectory-guided strategy that leverages timestamped sub-optimal expert demonstrations. Rather than replacing agent actions entirely, the proposed method computes a weighted combination of agent and expert actions based on a decaying intervention rate. Both components are designed as modular additions that can be integrated into existing actor-critic algorithms without structural modifications. Case studies on penicillin fermentation and simulated moving bed (SMB) processes demonstrate that the proposed framework outperforms baseline algorithms in terms of learning efficiency, stability, computation costs, and operational safety.</div></div>","PeriodicalId":50079,"journal":{"name":"Journal of Process Control","volume":"154 ","pages":"Article 103535"},"PeriodicalIF":3.9000,"publicationDate":"2025-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"An offline-to-online reinforcement learning framework with trajectory-guided exploration for industrial process control\",\"authors\":\"Jiyang Chen, Na Luo\",\"doi\":\"10.1016/j.jprocont.2025.103535\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Reinforcement learning (RL) in industrial process control faces critical challenges, including limited data availability, unsafe exploration, and the high cost of high-fidelity simulators. These issues limit the practical adoption of RL in process control systems. To address these limitations, this paper presents a comprehensive framework that combines offline pre-training with online finetuning. Specifically, the framework first employs offline RL method to learn conservative policies from historical data, preventing overestimation of unseen actions. It then transitions to fine-tuning using online RL method with a mixed replay buffer that gradually shifts from offline to online data. To further enhance safety during online exploration, this work introduces a trajectory-guided strategy that leverages timestamped sub-optimal expert demonstrations. Rather than replacing agent actions entirely, the proposed method computes a weighted combination of agent and expert actions based on a decaying intervention rate. Both components are designed as modular additions that can be integrated into existing actor-critic algorithms without structural modifications. Case studies on penicillin fermentation and simulated moving bed (SMB) processes demonstrate that the proposed framework outperforms baseline algorithms in terms of learning efficiency, stability, computation costs, and operational safety.</div></div>\",\"PeriodicalId\":50079,\"journal\":{\"name\":\"Journal of Process Control\",\"volume\":\"154 \",\"pages\":\"Article 103535\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Process Control\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0959152425001635\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"AUTOMATION & CONTROL SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Process Control","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0959152425001635","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"AUTOMATION & CONTROL SYSTEMS","Score":null,"Total":0}
An offline-to-online reinforcement learning framework with trajectory-guided exploration for industrial process control
Reinforcement learning (RL) in industrial process control faces critical challenges, including limited data availability, unsafe exploration, and the high cost of high-fidelity simulators. These issues limit the practical adoption of RL in process control systems. To address these limitations, this paper presents a comprehensive framework that combines offline pre-training with online finetuning. Specifically, the framework first employs offline RL method to learn conservative policies from historical data, preventing overestimation of unseen actions. It then transitions to fine-tuning using online RL method with a mixed replay buffer that gradually shifts from offline to online data. To further enhance safety during online exploration, this work introduces a trajectory-guided strategy that leverages timestamped sub-optimal expert demonstrations. Rather than replacing agent actions entirely, the proposed method computes a weighted combination of agent and expert actions based on a decaying intervention rate. Both components are designed as modular additions that can be integrated into existing actor-critic algorithms without structural modifications. Case studies on penicillin fermentation and simulated moving bed (SMB) processes demonstrate that the proposed framework outperforms baseline algorithms in terms of learning efficiency, stability, computation costs, and operational safety.
期刊介绍:
This international journal covers the application of control theory, operations research, computer science and engineering principles to the solution of process control problems. In addition to the traditional chemical processing and manufacturing applications, the scope of process control problems involves a wide range of applications that includes energy processes, nano-technology, systems biology, bio-medical engineering, pharmaceutical processing technology, energy storage and conversion, smart grid, and data analytics among others.
Papers on the theory in these areas will also be accepted provided the theoretical contribution is aimed at the application and the development of process control techniques.
Topics covered include:
• Control applications• Process monitoring• Plant-wide control• Process control systems• Control techniques and algorithms• Process modelling and simulation• Design methods
Advanced design methods exclude well established and widely studied traditional design techniques such as PID tuning and its many variants. Applications in fields such as control of automotive engines, machinery and robotics are not deemed suitable unless a clear motivation for the relevance to process control is provided.