AutoRL framework for bioprocess control: Optimizing reward function, architecture, and hyperparameters

IF 4.1 Q2 ENGINEERING, CHEMICAL

Digital Chemical Engineering Pub Date : 2025-08-14 DOI:10.1016/j.dche.2025.100261

D.A. Goulart , R.D. Pereira , F.V. Silva

{"title":"AutoRL framework for bioprocess control: Optimizing reward function, architecture, and hyperparameters","authors":"D.A. Goulart , R.D. Pereira , F.V. Silva","doi":"10.1016/j.dche.2025.100261","DOIUrl":null,"url":null,"abstract":"<div><div>This study proposes a structured AutoRL framework for the development of deep reinforcement learning (DRL) controllers in chemical process systems. Focusing on the control of a 3<span><math><mo>×</mo></math></span> 3 MIMO yeast fermentation bioreactor, the methodology jointly optimizes three key internal components of the DRL agent: the reward function, the neural network architecture, and the hyperparameters of the algorithm. A parameterizable logistic reward formulation is introduced to encode control objectives, such as steady-state accuracy, minimalization of actuation effort, and control smoothness, into a flexible and tunable structure. A dual loop optimization strategy combines grid search and Bayesian optimization to systematically explore and refine the agent’s design space. The resulting controller achieved average steady-state errors of 0.009 °C for reactor temperature and 0.19 g/L for ethanol concentration, while maintaining smooth and stable behavior under diverse operational scenarios. By formalizing reward design and integrating it with hyperparameter and architecture optimization, this work delivers a AutoRL methodology for DRL-based control, reducing reliance on expert heuristics and enhancing reproducibility in complex bioprocess applications.</div></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"16 ","pages":"Article 100261"},"PeriodicalIF":4.1000,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Chemical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772508125000456","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}

引用次数: 0

Abstract

This study proposes a structured AutoRL framework for the development of deep reinforcement learning (DRL) controllers in chemical process systems. Focusing on the control of a 3

\times

3 MIMO yeast fermentation bioreactor, the methodology jointly optimizes three key internal components of the DRL agent: the reward function, the neural network architecture, and the hyperparameters of the algorithm. A parameterizable logistic reward formulation is introduced to encode control objectives, such as steady-state accuracy, minimalization of actuation effort, and control smoothness, into a flexible and tunable structure. A dual loop optimization strategy combines grid search and Bayesian optimization to systematically explore and refine the agent’s design space. The resulting controller achieved average steady-state errors of 0.009 °C for reactor temperature and 0.19 g/L for ethanol concentration, while maintaining smooth and stable behavior under diverse operational scenarios. By formalizing reward design and integrating it with hyperparameter and architecture optimization, this work delivers a AutoRL methodology for DRL-based control, reducing reliance on expert heuristics and enhancing reproducibility in complex bioprocess applications.

查看原文本刊更多论文

生物过程控制的AutoRL框架：优化奖励函数、结构和超参数

本研究提出了一个结构化的AutoRL框架，用于开发化学过程系统中的深度强化学习（DRL）控制器。该方法以3x3 MIMO酵母发酵生物反应器的控制为重点，对DRL agent的三个关键内部组件：奖励函数、神经网络架构和算法的超参数进行了联合优化。引入了一个参数化的逻辑奖励公式，将控制目标（如稳态精度、驱动努力最小化和控制平滑度）编码为一个灵活可调的结构。采用网格搜索和贝叶斯优化相结合的双环优化策略，系统地探索和细化智能体的设计空间。该控制器在反应器温度和乙醇浓度的平均稳态误差分别为0.009°C和0.19 g/L，同时在各种操作场景下保持平稳稳定的行为。通过将奖励设计形式化并将其与超参数和架构优化相结合，本研究为基于drl的控制提供了一种AutoRL方法，减少了对专家启发式的依赖，并提高了复杂生物过程应用的可重复性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Digital Chemical Engineering

CiteScore

3.10

自引率

0.00%

发文量