A robust deep reinforcement learning approach for the control of crystallization processes

IF 3.9 2区工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Computers & Chemical Engineering Pub Date : 2025-04-04 DOI:10.1016/j.compchemeng.2025.109114

José Rodrigues Torraca Neto , Bruno Didier Olivier Capron , Argimiro Resende Secchi

{"title":"A robust deep reinforcement learning approach for the control of crystallization processes","authors":"José Rodrigues Torraca Neto , Bruno Didier Olivier Capron , Argimiro Resende Secchi","doi":"10.1016/j.compchemeng.2025.109114","DOIUrl":null,"url":null,"abstract":"<div><div>This work investigates the application of reinforcement learning (RL) for crystallization process control, focusing on robustness against parametric uncertainty and measurement noise. A curriculum learning approach with progressive uncertainty scaling and soft constraint enforcement was developed to enhance RL agent adaptability and performance. Four actor–critic RL algorithms—Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic Policy Gradient (TD3), Soft Actor–Critic (SAC), and Proximal Policy Optimization (PPO)—were trained and evaluated using baseline, domain randomization, and curriculum learning strategies. The performance of each algorithm was assessed based on key control metrics, including setpoint tracking, control smoothness, and constraint satisfaction, with Nonlinear Model Predictive Control (NMPC) serving as an oracle benchmark. The results show that PPO consistently outperformed other algorithms, achieving the lowest mean absolute percentage error (MAPE) for critical process parameters (2.20%) and the lowest violation probability (0.67%) under curriculum learning. This strategy also reduced control variability, with PPO achieving a control variability index (CVI) of 0.008, indicating smooth control actions. While DDPG and TD3 exhibited competitive performance, SAC suffered from high fluctuations and the lowest rewards across all training strategies, highlighting its limitations in stability-critical applications. The findings highlight the effectiveness of curriculum learning with soft constraints in enhancing RL performance for industrial process control, establishing PPO as a reliable solution for robust crystallization control.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"199 ","pages":"Article 109114"},"PeriodicalIF":3.9000,"publicationDate":"2025-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425001188","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

Abstract

This work investigates the application of reinforcement learning (RL) for crystallization process control, focusing on robustness against parametric uncertainty and measurement noise. A curriculum learning approach with progressive uncertainty scaling and soft constraint enforcement was developed to enhance RL agent adaptability and performance. Four actor–critic RL algorithms—Deep Deterministic Policy Gradient (DDPG), Twin Delayed Deep Deterministic Policy Gradient (TD3), Soft Actor–Critic (SAC), and Proximal Policy Optimization (PPO)—were trained and evaluated using baseline, domain randomization, and curriculum learning strategies. The performance of each algorithm was assessed based on key control metrics, including setpoint tracking, control smoothness, and constraint satisfaction, with Nonlinear Model Predictive Control (NMPC) serving as an oracle benchmark. The results show that PPO consistently outperformed other algorithms, achieving the lowest mean absolute percentage error (MAPE) for critical process parameters (2.20%) and the lowest violation probability (0.67%) under curriculum learning. This strategy also reduced control variability, with PPO achieving a control variability index (CVI) of 0.008, indicating smooth control actions. While DDPG and TD3 exhibited competitive performance, SAC suffered from high fluctuations and the lowest rewards across all training strategies, highlighting its limitations in stability-critical applications. The findings highlight the effectiveness of curriculum learning with soft constraints in enhancing RL performance for industrial process control, establishing PPO as a reliable solution for robust crystallization control.

查看原文本刊更多论文

用于控制结晶过程的鲁棒深度强化学习方法

这项工作研究了强化学习（RL）在结晶过程控制中的应用，重点是对参数不确定性和测量噪声的鲁棒性。为了提高RL智能体的适应性和性能，提出了一种渐进式不确定性缩放和软约束执行的课程学习方法。使用基线、领域随机化和课程学习策略对四种行为者-评论家RL算法——深度确定性策略梯度（DDPG）、双延迟深度确定性策略梯度（TD3）、软行为者-评论家（SAC）和近端策略优化（PPO）进行了训练和评估。每个算法的性能基于关键控制指标进行评估，包括设定值跟踪，控制平滑性和约束满意度，非线性模型预测控制（NMPC）作为oracle基准。结果表明，PPO算法始终优于其他算法，在课程学习情况下，关键工艺参数的平均绝对百分比误差（MAPE）最低（2.20%），违规概率最低（0.67%）。该策略还降低了控制变异性，PPO达到了0.008的控制变异性指数（CVI），表明控制动作平稳。虽然DDPG和TD3表现出了竞争力，但SAC在所有训练策略中都存在高波动和最低奖励，突出了其在稳定性关键应用中的局限性。研究结果强调了带有软约束的课程学习在提高工业过程控制RL性能方面的有效性，确立了PPO作为鲁棒结晶控制的可靠解决方案。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Chemical Engineering 工程技术-工程：化工

CiteScore

8.70

自引率

14.00%

发文量

374

审稿时长

70 days

期刊介绍： Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.