Machine Learning Modelling for Predicting the Efficacy of Ionic Liquid-Aided Biomass Pretreatment

IF 3.1 3区 工程技术 Q3 ENERGY & FUELS
Biswanath Mahanty, Munmun Gharami, Dibyajyoti Haldar
{"title":"Machine Learning Modelling for Predicting the Efficacy of Ionic Liquid-Aided Biomass Pretreatment","authors":"Biswanath Mahanty,&nbsp;Munmun Gharami,&nbsp;Dibyajyoti Haldar","doi":"10.1007/s12155-024-10747-2","DOIUrl":null,"url":null,"abstract":"<div><p>The influence of ionic liquid (IL) characteristics, lignocellulosic biomass (LCB) properties, and process conditions on LCB pretreatment is not well understood. In this study, a total of 129 experimental data on LCB (grass, agricultural, and forest residues) pretreatment using imidazolium, triethylamine, and choline-amino acid ILs were compiled to develop machine learning (ML) models for cellulose, hemicellulose, lignin, and solid recovery. Following data imputation, a bilayer artificial neural network (ANN) and random forest (RF) regression, the two most widely adopted ML models, were developed. The full-featured ANN following Bayesian hyperparameter (HP) optimisation offered excellent fit on training (<i>R</i><sup>2</sup>: 0.936–0.994), though cross-validation (<i>R</i><sub>2</sub>CV) performance remained marginally poor, i.e. between 0.547 and 0.761. The fitness of HP-optimised RF models varied between 0.824 and 0.939 for regression, and between 0.383 and 0.831 in cross-validation. Temperature and pretreatment time had been the most important predictors, except for hemicellulose recovery. Bayesian predictor selection combined with HP optimisation improved the <i>R</i><sup>2</sup>CV boundary for ANN (0.555–0.825), as well as for RF models (0.474–0.824). As predictive performance of the models varied depending on target response, use of a larger homogeneous dataset may be warranted. The predictive modelling framework for LCB pretreatment, developed in this study, can be extended to similar biochemical process systems.</p></div>","PeriodicalId":487,"journal":{"name":"BioEnergy Research","volume":"17 3","pages":"1569 - 1583"},"PeriodicalIF":3.1000,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BioEnergy Research","FirstCategoryId":"5","ListUrlMain":"https://link.springer.com/article/10.1007/s12155-024-10747-2","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"ENERGY & FUELS","Score":null,"Total":0}
引用次数: 0

Abstract

The influence of ionic liquid (IL) characteristics, lignocellulosic biomass (LCB) properties, and process conditions on LCB pretreatment is not well understood. In this study, a total of 129 experimental data on LCB (grass, agricultural, and forest residues) pretreatment using imidazolium, triethylamine, and choline-amino acid ILs were compiled to develop machine learning (ML) models for cellulose, hemicellulose, lignin, and solid recovery. Following data imputation, a bilayer artificial neural network (ANN) and random forest (RF) regression, the two most widely adopted ML models, were developed. The full-featured ANN following Bayesian hyperparameter (HP) optimisation offered excellent fit on training (R2: 0.936–0.994), though cross-validation (R2CV) performance remained marginally poor, i.e. between 0.547 and 0.761. The fitness of HP-optimised RF models varied between 0.824 and 0.939 for regression, and between 0.383 and 0.831 in cross-validation. Temperature and pretreatment time had been the most important predictors, except for hemicellulose recovery. Bayesian predictor selection combined with HP optimisation improved the R2CV boundary for ANN (0.555–0.825), as well as for RF models (0.474–0.824). As predictive performance of the models varied depending on target response, use of a larger homogeneous dataset may be warranted. The predictive modelling framework for LCB pretreatment, developed in this study, can be extended to similar biochemical process systems.

Abstract Image

预测离子液体辅助生物质预处理功效的机器学习模型
离子液体(IL)特性、木质纤维素生物质(LCB)特性和工艺条件对 LCB 预处理的影响尚不十分清楚。本研究汇编了 129 个使用咪唑、三乙胺和胆碱-氨基酸离子液体预处理 LCB(草、农业和森林残留物)的实验数据,以开发纤维素、半纤维素、木质素和固体回收的机器学习(ML)模型。数据归类后,开发了双层人工神经网络(ANN)和随机森林(RF)回归这两种最广泛采用的 ML 模型。经过贝叶斯超参数(HP)优化的全功能人工神经网络在训练中提供了极佳的拟合度(R2:0.936-0.994),但交叉验证(R2CV)性能仍然略差,即在 0.547 和 0.761 之间。经过 HP 优化的 RF 模型的回归适配度在 0.824 和 0.939 之间,交叉验证的适配度在 0.383 和 0.831 之间。除半纤维素回收率外,温度和预处理时间是最重要的预测因子。贝叶斯预测因子选择与 HP 优化相结合,改善了 ANN(0.555-0.825)和 RF 模型(0.474-0.824)的 R2CV 边界。由于模型的预测性能因目标反应而异,因此可能需要使用更大的同质数据集。本研究开发的枸杞预处理预测建模框架可扩展到类似的生化过程系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BioEnergy Research
BioEnergy Research ENERGY & FUELS-ENVIRONMENTAL SCIENCES
CiteScore
6.70
自引率
8.30%
发文量
174
审稿时长
3 months
期刊介绍: BioEnergy Research fills a void in the rapidly growing area of feedstock biology research related to biomass, biofuels, and bioenergy. The journal publishes a wide range of articles, including peer-reviewed scientific research, reviews, perspectives and commentary, industry news, and government policy updates. Its coverage brings together a uniquely broad combination of disciplines with a common focus on feedstock biology and science, related to biomass, biofeedstock, and bioenergy production.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信