Ranmeng Lin , Runda Jia , Fengyang Jiang , Jun Zheng , Dakuo He , Kang Li , Fuli Wang
{"title":"基于强化学习的浓缩-脱水过程安全协调优化","authors":"Ranmeng Lin , Runda Jia , Fengyang Jiang , Jun Zheng , Dakuo He , Kang Li , Fuli Wang","doi":"10.1016/j.neucom.2025.131022","DOIUrl":null,"url":null,"abstract":"<div><div>Due to its trial-and-error learning mechanism and limited intelligence, current reinforcement learning (RL) faces significant safety risks when applied to complex industrial scenarios. To improve its deployability in high-risk environments, this paper takes the thickening-dewatering process, a key and energy-intensive subprocess in mineral processing, as the research object and proposes a safe RL coordination optimization framework that leverages real-time human guidance mechanisms. The framework consists of two human-in-the-loop models: first, a human supervision model based on soft sensing, which predicts the safety of the agent’s actions at each step and identifies potential risks in advance; second, a human demonstration model based on imitation learning, which automatically generates safe alternative actions in line with human expertise when unsafe actions are detected. Finally, the safe actions, evaluated and filtered by the models, are used for interaction with the environment to ensure the safety of the RL process. Furthermore, this paper derives the upper bound of the discounted failure probability for the algorithm, theoretically validating the safety enhancement provided by the human guidance mechanism. Experimental results demonstrate that, while achieving a 100 % training safety rate, the proposed algorithm reduces energy consumption by 15.62 % compared to existing optimization algorithm, showing significant potential for practical application and broader deployment.</div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"652 ","pages":"Article 131022"},"PeriodicalIF":5.5000,"publicationDate":"2025-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Safe coordinated optimization of the thickening-dewatering process via reinforcement learning with real-time human guidance\",\"authors\":\"Ranmeng Lin , Runda Jia , Fengyang Jiang , Jun Zheng , Dakuo He , Kang Li , Fuli Wang\",\"doi\":\"10.1016/j.neucom.2025.131022\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Due to its trial-and-error learning mechanism and limited intelligence, current reinforcement learning (RL) faces significant safety risks when applied to complex industrial scenarios. To improve its deployability in high-risk environments, this paper takes the thickening-dewatering process, a key and energy-intensive subprocess in mineral processing, as the research object and proposes a safe RL coordination optimization framework that leverages real-time human guidance mechanisms. The framework consists of two human-in-the-loop models: first, a human supervision model based on soft sensing, which predicts the safety of the agent’s actions at each step and identifies potential risks in advance; second, a human demonstration model based on imitation learning, which automatically generates safe alternative actions in line with human expertise when unsafe actions are detected. Finally, the safe actions, evaluated and filtered by the models, are used for interaction with the environment to ensure the safety of the RL process. Furthermore, this paper derives the upper bound of the discounted failure probability for the algorithm, theoretically validating the safety enhancement provided by the human guidance mechanism. Experimental results demonstrate that, while achieving a 100 % training safety rate, the proposed algorithm reduces energy consumption by 15.62 % compared to existing optimization algorithm, showing significant potential for practical application and broader deployment.</div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"652 \",\"pages\":\"Article 131022\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-07-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225016947\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225016947","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
Safe coordinated optimization of the thickening-dewatering process via reinforcement learning with real-time human guidance
Due to its trial-and-error learning mechanism and limited intelligence, current reinforcement learning (RL) faces significant safety risks when applied to complex industrial scenarios. To improve its deployability in high-risk environments, this paper takes the thickening-dewatering process, a key and energy-intensive subprocess in mineral processing, as the research object and proposes a safe RL coordination optimization framework that leverages real-time human guidance mechanisms. The framework consists of two human-in-the-loop models: first, a human supervision model based on soft sensing, which predicts the safety of the agent’s actions at each step and identifies potential risks in advance; second, a human demonstration model based on imitation learning, which automatically generates safe alternative actions in line with human expertise when unsafe actions are detected. Finally, the safe actions, evaluated and filtered by the models, are used for interaction with the environment to ensure the safety of the RL process. Furthermore, this paper derives the upper bound of the discounted failure probability for the algorithm, theoretically validating the safety enhancement provided by the human guidance mechanism. Experimental results demonstrate that, while achieving a 100 % training safety rate, the proposed algorithm reduces energy consumption by 15.62 % compared to existing optimization algorithm, showing significant potential for practical application and broader deployment.
期刊介绍:
Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.