Huiping Liang , Junyao Xie , Biao Huang , Yonggang Li , Bei Sun , Chunhua Yang
{"title":"用于过程控制的新型模拟真实强化学习算法","authors":"Huiping Liang , Junyao Xie , Biao Huang , Yonggang Li , Bei Sun , Chunhua Yang","doi":"10.1016/j.ress.2024.110639","DOIUrl":null,"url":null,"abstract":"<div><div>While reinforcement learning (RL) has potential in advanced process control and optimization, its direct interaction with real industrial processes can pose safety concerns. Model-based pre-training of RL may alleviate such risks. However, the intricate nature of industrial processes complicates the establishment of entirely accurate simulation models. Consequently, RL-based controllers relying on simulation models can easily suffer from model-plant mismatch. On the one hand, utilizing offline data for pre-training of RL can also mitigate safety risks. However, it requires well-represented historical datasets. This is demanding because industrial processes mostly run under a regulatory mode with basic controllers. To handle these issues, this paper proposes a novel sim2real reinforcement learning algorithm. First, a state adaptor (SA) is proposed to align simulated states with real states to mitigate the model-plant mismatch. Then, a fix-horizon return is designed to replace traditional infinite-step return to provide genuine labels for the critic network, enhancing learning efficiency and stability. Finally, applying proximal policy optimization (PPO), the SA-PPO method is introduced to implement the proposed sim2real algorithm. Experimental results show that SA-PPO improves performance in MSE by 1.96% and in R by 21.64% on average for roasting process simulation. This verifies the effectiveness of the proposed method.</div></div>","PeriodicalId":54500,"journal":{"name":"Reliability Engineering & System Safety","volume":"254 ","pages":"Article 110639"},"PeriodicalIF":9.4000,"publicationDate":"2024-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A novel sim2real reinforcement learning algorithm for process control\",\"authors\":\"Huiping Liang , Junyao Xie , Biao Huang , Yonggang Li , Bei Sun , Chunhua Yang\",\"doi\":\"10.1016/j.ress.2024.110639\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>While reinforcement learning (RL) has potential in advanced process control and optimization, its direct interaction with real industrial processes can pose safety concerns. Model-based pre-training of RL may alleviate such risks. However, the intricate nature of industrial processes complicates the establishment of entirely accurate simulation models. Consequently, RL-based controllers relying on simulation models can easily suffer from model-plant mismatch. On the one hand, utilizing offline data for pre-training of RL can also mitigate safety risks. However, it requires well-represented historical datasets. This is demanding because industrial processes mostly run under a regulatory mode with basic controllers. To handle these issues, this paper proposes a novel sim2real reinforcement learning algorithm. First, a state adaptor (SA) is proposed to align simulated states with real states to mitigate the model-plant mismatch. Then, a fix-horizon return is designed to replace traditional infinite-step return to provide genuine labels for the critic network, enhancing learning efficiency and stability. Finally, applying proximal policy optimization (PPO), the SA-PPO method is introduced to implement the proposed sim2real algorithm. Experimental results show that SA-PPO improves performance in MSE by 1.96% and in R by 21.64% on average for roasting process simulation. This verifies the effectiveness of the proposed method.</div></div>\",\"PeriodicalId\":54500,\"journal\":{\"name\":\"Reliability Engineering & System Safety\",\"volume\":\"254 \",\"pages\":\"Article 110639\"},\"PeriodicalIF\":9.4000,\"publicationDate\":\"2024-11-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Reliability Engineering & System Safety\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0951832024007105\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENGINEERING, INDUSTRIAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Reliability Engineering & System Safety","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0951832024007105","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, INDUSTRIAL","Score":null,"Total":0}
A novel sim2real reinforcement learning algorithm for process control
While reinforcement learning (RL) has potential in advanced process control and optimization, its direct interaction with real industrial processes can pose safety concerns. Model-based pre-training of RL may alleviate such risks. However, the intricate nature of industrial processes complicates the establishment of entirely accurate simulation models. Consequently, RL-based controllers relying on simulation models can easily suffer from model-plant mismatch. On the one hand, utilizing offline data for pre-training of RL can also mitigate safety risks. However, it requires well-represented historical datasets. This is demanding because industrial processes mostly run under a regulatory mode with basic controllers. To handle these issues, this paper proposes a novel sim2real reinforcement learning algorithm. First, a state adaptor (SA) is proposed to align simulated states with real states to mitigate the model-plant mismatch. Then, a fix-horizon return is designed to replace traditional infinite-step return to provide genuine labels for the critic network, enhancing learning efficiency and stability. Finally, applying proximal policy optimization (PPO), the SA-PPO method is introduced to implement the proposed sim2real algorithm. Experimental results show that SA-PPO improves performance in MSE by 1.96% and in R by 21.64% on average for roasting process simulation. This verifies the effectiveness of the proposed method.
期刊介绍:
Elsevier publishes Reliability Engineering & System Safety in association with the European Safety and Reliability Association and the Safety Engineering and Risk Analysis Division. The international journal is devoted to developing and applying methods to enhance the safety and reliability of complex technological systems, like nuclear power plants, chemical plants, hazardous waste facilities, space systems, offshore and maritime systems, transportation systems, constructed infrastructure, and manufacturing plants. The journal normally publishes only articles that involve the analysis of substantive problems related to the reliability of complex systems or present techniques and/or theoretical results that have a discernable relationship to the solution of such problems. An important aim is to balance academic material and practical applications.