{"title":"Optimizing recombinant antibody fragment production: A comparison of artificial intelligence and statistical modeling","authors":"Majid Basafa, Atieh Hashemi, Aidin Behravan","doi":"10.1002/bab.2600","DOIUrl":null,"url":null,"abstract":"<p>Maximizing the recombinant protein yield necessitates optimizing the production medium. This can be done using a variety of methods, including the conventional “one-factor-at-a-time” approach and more recent statistical and mathematical methods such as artificial neural network (ANN), genetic algorithm, etc. Every approach has advantages and disadvantages of its own, yet even when a technique has flaws, it is nevertheless used to get the best results. Here, one categorical variable and four numerical parameters, including post-induction time, inducer concentration, post-induction temperature, and pre-induction cell density, were optimized using the 232 experimental assays of the central composite design. The direct and indirect effects of factors on the yield of anti-epithelial cell adhesion molecule extracellular domain fragment antibody were examined using statistical methods. The analysis of variance results indicate that the response surface methodology (RSM) model is effective in predicting the amount of produced single-chain fragment variable (<i>p</i>-value = 0.0001 and <i>R</i><sup>2</sup> = 0.905). For ANN modeling, the evaluation using normalized root mean square error (NRMSE) and <i>R</i><sup>2</sup> values shows a good fit (<i>R</i><sup>2</sup> = 0.942) and accurate predictions (NRMSE = 0.145). The analysis of error parameters and <i>R</i><sup>2</sup> of a dataset, which contained 30 data points randomly selected from the complete dataset, showed that the ANN model had a higher <i>R</i><sup>2</sup> value (0.968) compared to the RSM model (0.932). Furthermore, the ANN model demonstrated stronger predictive ability with a lower NRMSE (0.048 vs. 0.064). Induction at the cell density of 0.7 and an isopropyl β-D-1-thiogalactopyranoside concentration of 0.6 mM for 32 h at 30°C in BW25113 was the ideal culture condition leading to the protein yield of 259.51 mg/L. Under the optimum conditions, the output values predicted by the ANN model (259.83 mg/L) were more in line with the experimental data (259.51 mg/L) than the RSM (276.13 mg/L) expected value. This outcome demonstrated that the ANN model outperforms the RSM in terms of prediction accuracy.</p>","PeriodicalId":9274,"journal":{"name":"Biotechnology and applied biochemistry","volume":"71 5","pages":"1094-1104"},"PeriodicalIF":3.2000,"publicationDate":"2024-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biotechnology and applied biochemistry","FirstCategoryId":"5","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/bab.2600","RegionNum":4,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Maximizing the recombinant protein yield necessitates optimizing the production medium. This can be done using a variety of methods, including the conventional “one-factor-at-a-time” approach and more recent statistical and mathematical methods such as artificial neural network (ANN), genetic algorithm, etc. Every approach has advantages and disadvantages of its own, yet even when a technique has flaws, it is nevertheless used to get the best results. Here, one categorical variable and four numerical parameters, including post-induction time, inducer concentration, post-induction temperature, and pre-induction cell density, were optimized using the 232 experimental assays of the central composite design. The direct and indirect effects of factors on the yield of anti-epithelial cell adhesion molecule extracellular domain fragment antibody were examined using statistical methods. The analysis of variance results indicate that the response surface methodology (RSM) model is effective in predicting the amount of produced single-chain fragment variable (p-value = 0.0001 and R2 = 0.905). For ANN modeling, the evaluation using normalized root mean square error (NRMSE) and R2 values shows a good fit (R2 = 0.942) and accurate predictions (NRMSE = 0.145). The analysis of error parameters and R2 of a dataset, which contained 30 data points randomly selected from the complete dataset, showed that the ANN model had a higher R2 value (0.968) compared to the RSM model (0.932). Furthermore, the ANN model demonstrated stronger predictive ability with a lower NRMSE (0.048 vs. 0.064). Induction at the cell density of 0.7 and an isopropyl β-D-1-thiogalactopyranoside concentration of 0.6 mM for 32 h at 30°C in BW25113 was the ideal culture condition leading to the protein yield of 259.51 mg/L. Under the optimum conditions, the output values predicted by the ANN model (259.83 mg/L) were more in line with the experimental data (259.51 mg/L) than the RSM (276.13 mg/L) expected value. This outcome demonstrated that the ANN model outperforms the RSM in terms of prediction accuracy.
期刊介绍:
Published since 1979, Biotechnology and Applied Biochemistry is dedicated to the rapid publication of high quality, significant research at the interface between life sciences and their technological exploitation.
The Editors will consider papers for publication based on their novelty and impact as well as their contribution to the advancement of medical biotechnology and industrial biotechnology, covering cutting-edge research in synthetic biology, systems biology, metabolic engineering, bioengineering, biomaterials, biosensing, and nano-biotechnology.