{"title":"生物过程控制的AutoRL框架:优化奖励函数、结构和超参数","authors":"D.A. Goulart , R.D. Pereira , F.V. Silva","doi":"10.1016/j.dche.2025.100261","DOIUrl":null,"url":null,"abstract":"<div><div>This study proposes a structured AutoRL framework for the development of deep reinforcement learning (DRL) controllers in chemical process systems. Focusing on the control of a 3<span><math><mo>×</mo></math></span> 3 MIMO yeast fermentation bioreactor, the methodology jointly optimizes three key internal components of the DRL agent: the reward function, the neural network architecture, and the hyperparameters of the algorithm. A parameterizable logistic reward formulation is introduced to encode control objectives, such as steady-state accuracy, minimalization of actuation effort, and control smoothness, into a flexible and tunable structure. A dual loop optimization strategy combines grid search and Bayesian optimization to systematically explore and refine the agent’s design space. The resulting controller achieved average steady-state errors of 0.009 °C for reactor temperature and 0.19 g/L for ethanol concentration, while maintaining smooth and stable behavior under diverse operational scenarios. By formalizing reward design and integrating it with hyperparameter and architecture optimization, this work delivers a AutoRL methodology for DRL-based control, reducing reliance on expert heuristics and enhancing reproducibility in complex bioprocess applications.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"16 ","pages":"Article 100261"},"PeriodicalIF":4.1000,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"AutoRL framework for bioprocess control: Optimizing reward function, architecture, and hyperparameters\",\"authors\":\"D.A. Goulart , R.D. Pereira , F.V. Silva\",\"doi\":\"10.1016/j.dche.2025.100261\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This study proposes a structured AutoRL framework for the development of deep reinforcement learning (DRL) controllers in chemical process systems. Focusing on the control of a 3<span><math><mo>×</mo></math></span> 3 MIMO yeast fermentation bioreactor, the methodology jointly optimizes three key internal components of the DRL agent: the reward function, the neural network architecture, and the hyperparameters of the algorithm. A parameterizable logistic reward formulation is introduced to encode control objectives, such as steady-state accuracy, minimalization of actuation effort, and control smoothness, into a flexible and tunable structure. A dual loop optimization strategy combines grid search and Bayesian optimization to systematically explore and refine the agent’s design space. The resulting controller achieved average steady-state errors of 0.009 °C for reactor temperature and 0.19 g/L for ethanol concentration, while maintaining smooth and stable behavior under diverse operational scenarios. By formalizing reward design and integrating it with hyperparameter and architecture optimization, this work delivers a AutoRL methodology for DRL-based control, reducing reliance on expert heuristics and enhancing reproducibility in complex bioprocess applications.</div></div>\",\"PeriodicalId\":72815,\"journal\":{\"name\":\"Digital Chemical Engineering\",\"volume\":\"16 \",\"pages\":\"Article 100261\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-08-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Chemical Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772508125000456\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, CHEMICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Chemical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772508125000456","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
AutoRL framework for bioprocess control: Optimizing reward function, architecture, and hyperparameters
This study proposes a structured AutoRL framework for the development of deep reinforcement learning (DRL) controllers in chemical process systems. Focusing on the control of a 3 3 MIMO yeast fermentation bioreactor, the methodology jointly optimizes three key internal components of the DRL agent: the reward function, the neural network architecture, and the hyperparameters of the algorithm. A parameterizable logistic reward formulation is introduced to encode control objectives, such as steady-state accuracy, minimalization of actuation effort, and control smoothness, into a flexible and tunable structure. A dual loop optimization strategy combines grid search and Bayesian optimization to systematically explore and refine the agent’s design space. The resulting controller achieved average steady-state errors of 0.009 °C for reactor temperature and 0.19 g/L for ethanol concentration, while maintaining smooth and stable behavior under diverse operational scenarios. By formalizing reward design and integrating it with hyperparameter and architecture optimization, this work delivers a AutoRL methodology for DRL-based control, reducing reliance on expert heuristics and enhancing reproducibility in complex bioprocess applications.