A novel sim2real reinforcement learning algorithm for process control

IF 9.4 1区 工程技术 Q1 ENGINEERING, INDUSTRIAL
Huiping Liang , Junyao Xie , Biao Huang , Yonggang Li , Bei Sun , Chunhua Yang
{"title":"A novel sim2real reinforcement learning algorithm for process control","authors":"Huiping Liang ,&nbsp;Junyao Xie ,&nbsp;Biao Huang ,&nbsp;Yonggang Li ,&nbsp;Bei Sun ,&nbsp;Chunhua Yang","doi":"10.1016/j.ress.2024.110639","DOIUrl":null,"url":null,"abstract":"<div><div>While reinforcement learning (RL) has potential in advanced process control and optimization, its direct interaction with real industrial processes can pose safety concerns. Model-based pre-training of RL may alleviate such risks. However, the intricate nature of industrial processes complicates the establishment of entirely accurate simulation models. Consequently, RL-based controllers relying on simulation models can easily suffer from model-plant mismatch. On the one hand, utilizing offline data for pre-training of RL can also mitigate safety risks. However, it requires well-represented historical datasets. This is demanding because industrial processes mostly run under a regulatory mode with basic controllers. To handle these issues, this paper proposes a novel sim2real reinforcement learning algorithm. First, a state adaptor (SA) is proposed to align simulated states with real states to mitigate the model-plant mismatch. Then, a fix-horizon return is designed to replace traditional infinite-step return to provide genuine labels for the critic network, enhancing learning efficiency and stability. Finally, applying proximal policy optimization (PPO), the SA-PPO method is introduced to implement the proposed sim2real algorithm. Experimental results show that SA-PPO improves performance in MSE by 1.96% and in R by 21.64% on average for roasting process simulation. This verifies the effectiveness of the proposed method.</div></div>","PeriodicalId":54500,"journal":{"name":"Reliability Engineering & System Safety","volume":"254 ","pages":"Article 110639"},"PeriodicalIF":9.4000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Reliability Engineering & System Safety","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0951832024007105","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}
引用次数: 0

Abstract

While reinforcement learning (RL) has potential in advanced process control and optimization, its direct interaction with real industrial processes can pose safety concerns. Model-based pre-training of RL may alleviate such risks. However, the intricate nature of industrial processes complicates the establishment of entirely accurate simulation models. Consequently, RL-based controllers relying on simulation models can easily suffer from model-plant mismatch. On the one hand, utilizing offline data for pre-training of RL can also mitigate safety risks. However, it requires well-represented historical datasets. This is demanding because industrial processes mostly run under a regulatory mode with basic controllers. To handle these issues, this paper proposes a novel sim2real reinforcement learning algorithm. First, a state adaptor (SA) is proposed to align simulated states with real states to mitigate the model-plant mismatch. Then, a fix-horizon return is designed to replace traditional infinite-step return to provide genuine labels for the critic network, enhancing learning efficiency and stability. Finally, applying proximal policy optimization (PPO), the SA-PPO method is introduced to implement the proposed sim2real algorithm. Experimental results show that SA-PPO improves performance in MSE by 1.96% and in R by 21.64% on average for roasting process simulation. This verifies the effectiveness of the proposed method.
用于过程控制的新型模拟真实强化学习算法
虽然强化学习(RL)在先进过程控制和优化方面具有潜力,但它与实际工业过程的直接互动可能会带来安全问题。对 RL 进行基于模型的预训练可以减轻这种风险。然而,工业流程的复杂性使得建立完全精确的模拟模型变得复杂。因此,基于仿真模型的 RL 控制器很容易出现模型与工厂不匹配的问题。一方面,利用离线数据对 RL 进行预训练也能降低安全风险。不过,这需要代表性良好的历史数据集。这一点要求很高,因为工业流程大多是在使用基本控制器的调节模式下运行的。为了解决这些问题,本文提出了一种新颖的 sim2real 强化学习算法。首先,提出了一种状态适配器(SA),使模拟状态与真实状态保持一致,以减少模型与工厂之间的不匹配。然后,设计了一个固定视距回归器来取代传统的无限步回归器,为批判网络提供真正的标签,从而提高学习效率和稳定性。最后,应用近端策略优化(PPO),引入 SA-PPO 方法来实现所提出的 sim2real 算法。实验结果表明,在焙烧过程仿真中,SA-PPO 的 MSE 平均提高了 1.96%,R 平均提高了 21.64%。这验证了所提方法的有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Reliability Engineering & System Safety
Reliability Engineering & System Safety 管理科学-工程:工业
CiteScore
15.20
自引率
39.50%
发文量
621
审稿时长
67 days
期刊介绍: Elsevier publishes Reliability Engineering & System Safety in association with the European Safety and Reliability Association and the Safety Engineering and Risk Analysis Division. The international journal is devoted to developing and applying methods to enhance the safety and reliability of complex technological systems, like nuclear power plants, chemical plants, hazardous waste facilities, space systems, offshore and maritime systems, transportation systems, constructed infrastructure, and manufacturing plants. The journal normally publishes only articles that involve the analysis of substantive problems related to the reliability of complex systems or present techniques and/or theoretical results that have a discernable relationship to the solution of such problems. An important aim is to balance academic material and practical applications.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信