{"title":"Machine learning algorithms for in-line monitoring during yeast fermentations based on Raman spectroscopy","authors":"Debiao Wu , Yaying Xu , Feng Xu, Minghao Shao, Mingzhi Huang","doi":"10.1016/j.vibspec.2024.103672","DOIUrl":null,"url":null,"abstract":"<div><p>Given the intricacies and nonlinearity inherent to industrial fermentation systems, the application of process analytical technology presents considerable benefits for the direct, real-time monitoring, control, and assessment of synthetic processes. In this study, we introduce an in-line monitoring approach utilizing Raman spectroscopy for ethanol production by Saccharomyces cerevisiae. Initially, we employed feature selection techniques from the realm of machine learning to reduce the dimensionality of the Raman spectral data. Our findings reveal that feature selection results in a noteworthy reduction of over 90% in model training time, concurrently enhancing the predictive performance of glycerol and cell concentration by 14.20% and 17.10% at the root mean square error (RMSE) level. Subsequently, we conducted model retraining using 15 machine learning algorithms, with hyperparameters optimized through grid search. Our results illustrate that the post-hyperparameter adjustment model exhibits improvements in RMSE for ethanol, glycerol, glucose, and biomass by 9.73%, 4.33%, 22.22%, and 13.79%, respectively. Finally, specific machine learning algorithms, namely BaggingRegressor, Support Vector Regression, BayesianRidge, and VotingRegressor, were identified as suitable models for predicting glucose, ethanol, glycerol, and cell concentrations, respectively. Notably, the coefficient of determination (R<sup>2</sup>) ranged from 0.89 to 0.97, and RMSE values ranged from 0.06 to 2.59 g/L on the testing datasets. The study highlights machine learning's effectiveness in Raman spectroscopy data analysis for improved industrial fermentation monitoring, enhancing efficiency, and offering novel modeling insights.</p></div>","PeriodicalId":23656,"journal":{"name":"Vibrational Spectroscopy","volume":"132 ","pages":"Article 103672"},"PeriodicalIF":2.7000,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Vibrational Spectroscopy","FirstCategoryId":"92","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0924203124000250","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, ANALYTICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Given the intricacies and nonlinearity inherent to industrial fermentation systems, the application of process analytical technology presents considerable benefits for the direct, real-time monitoring, control, and assessment of synthetic processes. In this study, we introduce an in-line monitoring approach utilizing Raman spectroscopy for ethanol production by Saccharomyces cerevisiae. Initially, we employed feature selection techniques from the realm of machine learning to reduce the dimensionality of the Raman spectral data. Our findings reveal that feature selection results in a noteworthy reduction of over 90% in model training time, concurrently enhancing the predictive performance of glycerol and cell concentration by 14.20% and 17.10% at the root mean square error (RMSE) level. Subsequently, we conducted model retraining using 15 machine learning algorithms, with hyperparameters optimized through grid search. Our results illustrate that the post-hyperparameter adjustment model exhibits improvements in RMSE for ethanol, glycerol, glucose, and biomass by 9.73%, 4.33%, 22.22%, and 13.79%, respectively. Finally, specific machine learning algorithms, namely BaggingRegressor, Support Vector Regression, BayesianRidge, and VotingRegressor, were identified as suitable models for predicting glucose, ethanol, glycerol, and cell concentrations, respectively. Notably, the coefficient of determination (R2) ranged from 0.89 to 0.97, and RMSE values ranged from 0.06 to 2.59 g/L on the testing datasets. The study highlights machine learning's effectiveness in Raman spectroscopy data analysis for improved industrial fermentation monitoring, enhancing efficiency, and offering novel modeling insights.
期刊介绍:
Vibrational Spectroscopy provides a vehicle for the publication of original research that focuses on vibrational spectroscopy. This covers infrared, near-infrared and Raman spectroscopies and publishes papers dealing with developments in applications, theory, techniques and instrumentation.
The topics covered by the journal include:
Sampling techniques,
Vibrational spectroscopy coupled with separation techniques,
Instrumentation (Fourier transform, conventional and laser based),
Data manipulation,
Spectra-structure correlation and group frequencies.
The application areas covered include:
Analytical chemistry,
Bio-organic and bio-inorganic chemistry,
Organic chemistry,
Inorganic chemistry,
Catalysis,
Environmental science,
Industrial chemistry,
Materials science,
Physical chemistry,
Polymer science,
Process control,
Specialized problem solving.