A novel sim2real reinforcement learning algorithm for process control

IF 9.4 1区工程技术 Q1 ENGINEERING, INDUSTRIAL

Reliability Engineering & System Safety Pub Date : 2024-11-15 DOI:10.1016/j.ress.2024.110639

Huiping Liang , Junyao Xie , Biao Huang , Yonggang Li , Bei Sun , Chunhua Yang

{"title":"A novel sim2real reinforcement learning algorithm for process control","authors":"Huiping Liang , Junyao Xie , Biao Huang , Yonggang Li , Bei Sun , Chunhua Yang","doi":"10.1016/j.ress.2024.110639","DOIUrl":null,"url":null,"abstract":"<div><div>While reinforcement learning (RL) has potential in advanced process control and optimization, its direct interaction with real industrial processes can pose safety concerns. Model-based pre-training of RL may alleviate such risks. However, the intricate nature of industrial processes complicates the establishment of entirely accurate simulation models. Consequently, RL-based controllers relying on simulation models can easily suffer from model-plant mismatch. On the one hand, utilizing offline data for pre-training of RL can also mitigate safety risks. However, it requires well-represented historical datasets. This is demanding because industrial processes mostly run under a regulatory mode with basic controllers. To handle these issues, this paper proposes a novel sim2real reinforcement learning algorithm. First, a state adaptor (SA) is proposed to align simulated states with real states to mitigate the model-plant mismatch. Then, a fix-horizon return is designed to replace traditional infinite-step return to provide genuine labels for the critic network, enhancing learning efficiency and stability. Finally, applying proximal policy optimization (PPO), the SA-PPO method is introduced to implement the proposed sim2real algorithm. Experimental results show that SA-PPO improves performance in MSE by 1.96% and in R by 21.64% on average for roasting process simulation. This verifies the effectiveness of the proposed method.</div></div>","PeriodicalId":54500,"journal":{"name":"Reliability Engineering & System Safety","volume":"254 ","pages":"Article 110639"},"PeriodicalIF":9.4000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Reliability Engineering & System Safety","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0951832024007105","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}

引用次数: 0

Abstract

While reinforcement learning (RL) has potential in advanced process control and optimization, its direct interaction with real industrial processes can pose safety concerns. Model-based pre-training of RL may alleviate such risks. However, the intricate nature of industrial processes complicates the establishment of entirely accurate simulation models. Consequently, RL-based controllers relying on simulation models can easily suffer from model-plant mismatch. On the one hand, utilizing offline data for pre-training of RL can also mitigate safety risks. However, it requires well-represented historical datasets. This is demanding because industrial processes mostly run under a regulatory mode with basic controllers. To handle these issues, this paper proposes a novel sim2real reinforcement learning algorithm. First, a state adaptor (SA) is proposed to align simulated states with real states to mitigate the model-plant mismatch. Then, a fix-horizon return is designed to replace traditional infinite-step return to provide genuine labels for the critic network, enhancing learning efficiency and stability. Finally, applying proximal policy optimization (PPO), the SA-PPO method is introduced to implement the proposed sim2real algorithm. Experimental results show that SA-PPO improves performance in MSE by 1.96% and in R by 21.64% on average for roasting process simulation. This verifies the effectiveness of the proposed method.

查看原文本刊更多论文

用于过程控制的新型模拟真实强化学习算法

虽然强化学习（RL）在先进过程控制和优化方面具有潜力，但它与实际工业过程的直接互动可能会带来安全问题。对 RL 进行基于模型的预训练可以减轻这种风险。然而，工业流程的复杂性使得建立完全精确的模拟模型变得复杂。因此，基于仿真模型的 RL 控制器很容易出现模型与工厂不匹配的问题。一方面，利用离线数据对 RL 进行预训练也能降低安全风险。不过，这需要代表性良好的历史数据集。这一点要求很高，因为工业流程大多是在使用基本控制器的调节模式下运行的。为了解决这些问题，本文提出了一种新颖的 sim2real 强化学习算法。首先，提出了一种状态适配器（SA），使模拟状态与真实状态保持一致，以减少模型与工厂之间的不匹配。然后，设计了一个固定视距回归器来取代传统的无限步回归器，为批判网络提供真正的标签，从而提高学习效率和稳定性。最后，应用近端策略优化（PPO），引入 SA-PPO 方法来实现所提出的 sim2real 算法。实验结果表明，在焙烧过程仿真中，SA-PPO 的 MSE 平均提高了 1.96%，R 平均提高了 21.64%。这验证了所提方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Reliability Engineering & System Safety 管理科学-工程：工业

CiteScore

15.20

自引率

39.50%

发文量

621

审稿时长

67 days

期刊介绍： Elsevier publishes Reliability Engineering & System Safety in association with the European Safety and Reliability Association and the Safety Engineering and Risk Analysis Division. The international journal is devoted to developing and applying methods to enhance the safety and reliability of complex technological systems, like nuclear power plants, chemical plants, hazardous waste facilities, space systems, offshore and maritime systems, transportation systems, constructed infrastructure, and manufacturing plants. The journal normally publishes only articles that involve the analysis of substantive problems related to the reliability of complex systems or present techniques and/or theoretical results that have a discernable relationship to the solution of such problems. An important aim is to balance academic material and practical applications.