José Rodrigues Torraca Neto , Bruno Didier Olivier Capron , Argimiro Resende Secchi
{"title":"A robust deep reinforcement learning approach for the control of crystallization processes","authors":"José Rodrigues Torraca Neto , Bruno Didier Olivier Capron , Argimiro Resende Secchi","doi":"10.1016/j.compchemeng.2025.109114","DOIUrl":null,"url":null,"abstract":"<div><div>This work investigates the application of reinforcement learning (RL) for crystallization process control, focusing on robustness against parametric uncertainty and measurement noise. A curriculum learning approach with progressive uncertainty scaling and soft constraint enforcement was developed to enhance RL agent adaptability and performance. Four actor–critic RL algorithms—Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic Policy Gradient (TD3), Soft Actor–Critic (SAC), and Proximal Policy Optimization (PPO)—were trained and evaluated using baseline, domain randomization, and curriculum learning strategies. The performance of each algorithm was assessed based on key control metrics, including setpoint tracking, control smoothness, and constraint satisfaction, with Nonlinear Model Predictive Control (NMPC) serving as an oracle benchmark. The results show that PPO consistently outperformed other algorithms, achieving the lowest mean absolute percentage error (MAPE) for critical process parameters (2.20%) and the lowest violation probability (0.67%) under curriculum learning. This strategy also reduced control variability, with PPO achieving a control variability index (CVI) of 0.008, indicating smooth control actions. While DDPG and TD3 exhibited competitive performance, SAC suffered from high fluctuations and the lowest rewards across all training strategies, highlighting its limitations in stability-critical applications. The findings highlight the effectiveness of curriculum learning with soft constraints in enhancing RL performance for industrial process control, establishing PPO as a reliable solution for robust crystallization control.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"199 ","pages":"Article 109114"},"PeriodicalIF":3.9000,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425001188","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
This work investigates the application of reinforcement learning (RL) for crystallization process control, focusing on robustness against parametric uncertainty and measurement noise. A curriculum learning approach with progressive uncertainty scaling and soft constraint enforcement was developed to enhance RL agent adaptability and performance. Four actor–critic RL algorithms—Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic Policy Gradient (TD3), Soft Actor–Critic (SAC), and Proximal Policy Optimization (PPO)—were trained and evaluated using baseline, domain randomization, and curriculum learning strategies. The performance of each algorithm was assessed based on key control metrics, including setpoint tracking, control smoothness, and constraint satisfaction, with Nonlinear Model Predictive Control (NMPC) serving as an oracle benchmark. The results show that PPO consistently outperformed other algorithms, achieving the lowest mean absolute percentage error (MAPE) for critical process parameters (2.20%) and the lowest violation probability (0.67%) under curriculum learning. This strategy also reduced control variability, with PPO achieving a control variability index (CVI) of 0.008, indicating smooth control actions. While DDPG and TD3 exhibited competitive performance, SAC suffered from high fluctuations and the lowest rewards across all training strategies, highlighting its limitations in stability-critical applications. The findings highlight the effectiveness of curriculum learning with soft constraints in enhancing RL performance for industrial process control, establishing PPO as a reliable solution for robust crystallization control.
期刊介绍:
Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.