Data-driven modeling of syngas yield from biomass pyrolysis using hybrid tree-based machine learning algorithms

IF 6.2 1区 农林科学 Q1 AGRICULTURAL ENGINEERING
DU Jia-Hao, LUO Sheng-Li, Samim Sherzod
{"title":"Data-driven modeling of syngas yield from biomass pyrolysis using hybrid tree-based machine learning algorithms","authors":"DU Jia-Hao, LUO Sheng-Li, Samim Sherzod","doi":"10.1016/j.indcrop.2025.121993","DOIUrl":null,"url":null,"abstract":"The practical realization of high syngas yield from biomass pyrolysis is complicated by the structural and compositional diversity of feedstocks and the multitude of simultaneous reactions occurring during the process. To overcome this problem, we introduce a robust data driven based methodology by implementation of hybrid tree-based algorithms namely Gradient Boosting Machine (GBM), Random Forest (RF), Light Gradient Boosting Machine (LighGBM), Categorical Boosting (CatBoost), Extra Trees (ET), Extreme Gradient Boosting (XGBoost), and Decision Tree (DT) to predict syngas yield based upon biomass pyrolysis processes. A total of 204 with comprehensive effective features pertinent to syngas yield, including biomass properties, operating conditions, and catalyst characteristics are gathered from published sources. Model hyperparameter optimization is conducted via strong Tree-structured Parzen Estimator (TPE) which is coupled with k-fold cross-validation algorithm to reduce overfitting and improve generalization. The findings indicated that Decision Tree and Random Forest are the best performant models based upon the evaluation metrics and graphical plots. Also, Decision Tree is found to be the top-performing hybrid model according to runtime performance. This work presents a new and effective use of tree-based machine learning models, enhanced through comparative evaluation and systematic optimization, to model syngas yield with high accuracy and interpretability. It contributes to addressing a significant research gap related to the development of reliable predictive frameworks for this complex biomass characteristic. Through the incorporation of outlier detection, sensitivity analysis, and hyperparameter tuning via optimization strategies, the study proposes a comprehensive and transferable workflow applicable to various biomass datasets.","PeriodicalId":13581,"journal":{"name":"Industrial Crops and Products","volume":"35 1","pages":""},"PeriodicalIF":6.2000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Industrial Crops and Products","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1016/j.indcrop.2025.121993","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}
引用次数: 0

Abstract

The practical realization of high syngas yield from biomass pyrolysis is complicated by the structural and compositional diversity of feedstocks and the multitude of simultaneous reactions occurring during the process. To overcome this problem, we introduce a robust data driven based methodology by implementation of hybrid tree-based algorithms namely Gradient Boosting Machine (GBM), Random Forest (RF), Light Gradient Boosting Machine (LighGBM), Categorical Boosting (CatBoost), Extra Trees (ET), Extreme Gradient Boosting (XGBoost), and Decision Tree (DT) to predict syngas yield based upon biomass pyrolysis processes. A total of 204 with comprehensive effective features pertinent to syngas yield, including biomass properties, operating conditions, and catalyst characteristics are gathered from published sources. Model hyperparameter optimization is conducted via strong Tree-structured Parzen Estimator (TPE) which is coupled with k-fold cross-validation algorithm to reduce overfitting and improve generalization. The findings indicated that Decision Tree and Random Forest are the best performant models based upon the evaluation metrics and graphical plots. Also, Decision Tree is found to be the top-performing hybrid model according to runtime performance. This work presents a new and effective use of tree-based machine learning models, enhanced through comparative evaluation and systematic optimization, to model syngas yield with high accuracy and interpretability. It contributes to addressing a significant research gap related to the development of reliable predictive frameworks for this complex biomass characteristic. Through the incorporation of outlier detection, sensitivity analysis, and hyperparameter tuning via optimization strategies, the study proposes a comprehensive and transferable workflow applicable to various biomass datasets.

Abstract Image

基于混合树的机器学习算法的生物质热解合成气产量数据驱动建模
由于原料的结构和组成的多样性,以及在热解过程中同时发生的大量反应,使生物质热解高合成气产量的实际实现变得复杂。为了克服这一问题,我们引入了一种基于数据驱动的鲁棒方法,通过实现基于混合树的算法,即梯度增强机(GBM)、随机森林(RF)、轻梯度增强机(LighGBM)、分类增强(CatBoost)、额外树(ET)、极端梯度增强(XGBoost)和决策树(DT)来预测基于生物质热解过程的合成气产量。从公开的资料中收集了204个与合成气产量相关的综合有效特征,包括生物质特性、操作条件和催化剂特性。采用强树结构Parzen估计器(TPE)进行模型超参数优化,并结合k-fold交叉验证算法减少过拟合,提高泛化能力。结果表明,决策树模型和随机森林模型是评价指标和图的最佳模型。同时,根据运行时性能,发现Decision Tree是性能最好的混合模型。这项工作提出了一种新的、有效的基于树的机器学习模型,通过比较评估和系统优化来增强,以高精度和可解释性来模拟合成气产量。它有助于解决与开发这种复杂生物量特征的可靠预测框架相关的重大研究差距。通过结合异常值检测、灵敏度分析和通过优化策略进行超参数调整,该研究提出了一种适用于各种生物质数据集的综合可转移工作流程。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Industrial Crops and Products
Industrial Crops and Products 农林科学-农业工程
CiteScore
9.50
自引率
8.50%
发文量
1518
审稿时长
43 days
期刊介绍: Industrial Crops and Products is an International Journal publishing academic and industrial research on industrial (defined as non-food/non-feed) crops and products. Papers concern both crop-oriented and bio-based materials from crops-oriented research, and should be of interest to an international audience, hypothesis driven, and where comparisons are made statistics performed.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信