{"title":"基于混合树的机器学习算法的生物质热解合成气产量数据驱动建模","authors":"DU Jia-Hao, LUO Sheng-Li, Samim Sherzod","doi":"10.1016/j.indcrop.2025.121993","DOIUrl":null,"url":null,"abstract":"The practical realization of high syngas yield from biomass pyrolysis is complicated by the structural and compositional diversity of feedstocks and the multitude of simultaneous reactions occurring during the process. To overcome this problem, we introduce a robust data driven based methodology by implementation of hybrid tree-based algorithms namely Gradient Boosting Machine (GBM), Random Forest (RF), Light Gradient Boosting Machine (LighGBM), Categorical Boosting (CatBoost), Extra Trees (ET), Extreme Gradient Boosting (XGBoost), and Decision Tree (DT) to predict syngas yield based upon biomass pyrolysis processes. A total of 204 with comprehensive effective features pertinent to syngas yield, including biomass properties, operating conditions, and catalyst characteristics are gathered from published sources. Model hyperparameter optimization is conducted via strong Tree-structured Parzen Estimator (TPE) which is coupled with k-fold cross-validation algorithm to reduce overfitting and improve generalization. The findings indicated that Decision Tree and Random Forest are the best performant models based upon the evaluation metrics and graphical plots. Also, Decision Tree is found to be the top-performing hybrid model according to runtime performance. This work presents a new and effective use of tree-based machine learning models, enhanced through comparative evaluation and systematic optimization, to model syngas yield with high accuracy and interpretability. It contributes to addressing a significant research gap related to the development of reliable predictive frameworks for this complex biomass characteristic. Through the incorporation of outlier detection, sensitivity analysis, and hyperparameter tuning via optimization strategies, the study proposes a comprehensive and transferable workflow applicable to various biomass datasets.","PeriodicalId":13581,"journal":{"name":"Industrial Crops and Products","volume":"35 1","pages":""},"PeriodicalIF":6.2000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Data-driven modeling of syngas yield from biomass pyrolysis using hybrid tree-based machine learning algorithms\",\"authors\":\"DU Jia-Hao, LUO Sheng-Li, Samim Sherzod\",\"doi\":\"10.1016/j.indcrop.2025.121993\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The practical realization of high syngas yield from biomass pyrolysis is complicated by the structural and compositional diversity of feedstocks and the multitude of simultaneous reactions occurring during the process. To overcome this problem, we introduce a robust data driven based methodology by implementation of hybrid tree-based algorithms namely Gradient Boosting Machine (GBM), Random Forest (RF), Light Gradient Boosting Machine (LighGBM), Categorical Boosting (CatBoost), Extra Trees (ET), Extreme Gradient Boosting (XGBoost), and Decision Tree (DT) to predict syngas yield based upon biomass pyrolysis processes. A total of 204 with comprehensive effective features pertinent to syngas yield, including biomass properties, operating conditions, and catalyst characteristics are gathered from published sources. Model hyperparameter optimization is conducted via strong Tree-structured Parzen Estimator (TPE) which is coupled with k-fold cross-validation algorithm to reduce overfitting and improve generalization. The findings indicated that Decision Tree and Random Forest are the best performant models based upon the evaluation metrics and graphical plots. Also, Decision Tree is found to be the top-performing hybrid model according to runtime performance. This work presents a new and effective use of tree-based machine learning models, enhanced through comparative evaluation and systematic optimization, to model syngas yield with high accuracy and interpretability. It contributes to addressing a significant research gap related to the development of reliable predictive frameworks for this complex biomass characteristic. Through the incorporation of outlier detection, sensitivity analysis, and hyperparameter tuning via optimization strategies, the study proposes a comprehensive and transferable workflow applicable to various biomass datasets.\",\"PeriodicalId\":13581,\"journal\":{\"name\":\"Industrial Crops and Products\",\"volume\":\"35 1\",\"pages\":\"\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-09-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Industrial Crops and Products\",\"FirstCategoryId\":\"97\",\"ListUrlMain\":\"https://doi.org/10.1016/j.indcrop.2025.121993\",\"RegionNum\":1,\"RegionCategory\":\"农林科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"AGRICULTURAL ENGINEERING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Industrial Crops and Products","FirstCategoryId":"97","ListUrlMain":"https://doi.org/10.1016/j.indcrop.2025.121993","RegionNum":1,"RegionCategory":"农林科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"AGRICULTURAL ENGINEERING","Score":null,"Total":0}
Data-driven modeling of syngas yield from biomass pyrolysis using hybrid tree-based machine learning algorithms
The practical realization of high syngas yield from biomass pyrolysis is complicated by the structural and compositional diversity of feedstocks and the multitude of simultaneous reactions occurring during the process. To overcome this problem, we introduce a robust data driven based methodology by implementation of hybrid tree-based algorithms namely Gradient Boosting Machine (GBM), Random Forest (RF), Light Gradient Boosting Machine (LighGBM), Categorical Boosting (CatBoost), Extra Trees (ET), Extreme Gradient Boosting (XGBoost), and Decision Tree (DT) to predict syngas yield based upon biomass pyrolysis processes. A total of 204 with comprehensive effective features pertinent to syngas yield, including biomass properties, operating conditions, and catalyst characteristics are gathered from published sources. Model hyperparameter optimization is conducted via strong Tree-structured Parzen Estimator (TPE) which is coupled with k-fold cross-validation algorithm to reduce overfitting and improve generalization. The findings indicated that Decision Tree and Random Forest are the best performant models based upon the evaluation metrics and graphical plots. Also, Decision Tree is found to be the top-performing hybrid model according to runtime performance. This work presents a new and effective use of tree-based machine learning models, enhanced through comparative evaluation and systematic optimization, to model syngas yield with high accuracy and interpretability. It contributes to addressing a significant research gap related to the development of reliable predictive frameworks for this complex biomass characteristic. Through the incorporation of outlier detection, sensitivity analysis, and hyperparameter tuning via optimization strategies, the study proposes a comprehensive and transferable workflow applicable to various biomass datasets.
期刊介绍:
Industrial Crops and Products is an International Journal publishing academic and industrial research on industrial (defined as non-food/non-feed) crops and products. Papers concern both crop-oriented and bio-based materials from crops-oriented research, and should be of interest to an international audience, hypothesis driven, and where comparisons are made statistics performed.