{"title":"基于多因素近端策略优化的单克隆抗体生产过程控制","authors":"Nikita Gupta , Shikhar Anand , Tanuja Joshi , Deepak Kumar , Manojkumar Ramteke , Hariprasad Kodamana","doi":"10.1016/j.dche.2023.100108","DOIUrl":null,"url":null,"abstract":"<div><p>Monoclonal antibodies (mAb) are biopharmaceutical products that improve human immunity. In this work, we propose a multi-actor proximal policy optimization-based reinforcement learning (RL) for the control of mAb production. Here, manipulated variable is flowrate and the control variable is mAb concentration. Based on root mean square error (RMSE) values and convergence performance, it has been observed that multi-actor PPO has performed better as compared to other RL algorithms. It is observed that PPO predicts a 40 % reduction in the number of days to reach the desired concentration. Moreover, the performance of PPO is improved as the number of actors increases. PPO agent shows the best performance with three actors, but on further increasing, its performance deteriorated. These results are verified based on three case studies, namely, (i) for nominal conditions, (ii) in the presence of noise in raw materials and measurements, and (iii) in the presence of stochastic disturbance in temperature and noise in measurements. The results indicate that the proposed approach outperforms the deep deterministic policy gradient (DDPG), twin delayed deep deterministic policy gradient (TD3), and proximal policy optimization (PPO) algorithms for the control of the bioreactor system.</p></div>","PeriodicalId":72815,"journal":{"name":"Digital Chemical Engineering","volume":"8 ","pages":"Article 100108"},"PeriodicalIF":3.0000,"publicationDate":"2023-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Process control of mAb production using multi-actor proximal policy optimization\",\"authors\":\"Nikita Gupta , Shikhar Anand , Tanuja Joshi , Deepak Kumar , Manojkumar Ramteke , Hariprasad Kodamana\",\"doi\":\"10.1016/j.dche.2023.100108\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Monoclonal antibodies (mAb) are biopharmaceutical products that improve human immunity. In this work, we propose a multi-actor proximal policy optimization-based reinforcement learning (RL) for the control of mAb production. Here, manipulated variable is flowrate and the control variable is mAb concentration. Based on root mean square error (RMSE) values and convergence performance, it has been observed that multi-actor PPO has performed better as compared to other RL algorithms. It is observed that PPO predicts a 40 % reduction in the number of days to reach the desired concentration. Moreover, the performance of PPO is improved as the number of actors increases. PPO agent shows the best performance with three actors, but on further increasing, its performance deteriorated. These results are verified based on three case studies, namely, (i) for nominal conditions, (ii) in the presence of noise in raw materials and measurements, and (iii) in the presence of stochastic disturbance in temperature and noise in measurements. The results indicate that the proposed approach outperforms the deep deterministic policy gradient (DDPG), twin delayed deep deterministic policy gradient (TD3), and proximal policy optimization (PPO) algorithms for the control of the bioreactor system.</p></div>\",\"PeriodicalId\":72815,\"journal\":{\"name\":\"Digital Chemical Engineering\",\"volume\":\"8 \",\"pages\":\"Article 100108\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2023-09-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Digital Chemical Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2772508123000261\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, CHEMICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Digital Chemical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2772508123000261","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, CHEMICAL","Score":null,"Total":0}
Process control of mAb production using multi-actor proximal policy optimization
Monoclonal antibodies (mAb) are biopharmaceutical products that improve human immunity. In this work, we propose a multi-actor proximal policy optimization-based reinforcement learning (RL) for the control of mAb production. Here, manipulated variable is flowrate and the control variable is mAb concentration. Based on root mean square error (RMSE) values and convergence performance, it has been observed that multi-actor PPO has performed better as compared to other RL algorithms. It is observed that PPO predicts a 40 % reduction in the number of days to reach the desired concentration. Moreover, the performance of PPO is improved as the number of actors increases. PPO agent shows the best performance with three actors, but on further increasing, its performance deteriorated. These results are verified based on three case studies, namely, (i) for nominal conditions, (ii) in the presence of noise in raw materials and measurements, and (iii) in the presence of stochastic disturbance in temperature and noise in measurements. The results indicate that the proposed approach outperforms the deep deterministic policy gradient (DDPG), twin delayed deep deterministic policy gradient (TD3), and proximal policy optimization (PPO) algorithms for the control of the bioreactor system.