基于图神经网络分子结构的生物质水热液化生物油性质预测的可解释机器学习模型

IF 9 1区 工程技术 Q1 ENERGY & FUELS
Yue Tian , Junghwan Kim , Yi Liu
{"title":"基于图神经网络分子结构的生物质水热液化生物油性质预测的可解释机器学习模型","authors":"Yue Tian ,&nbsp;Junghwan Kim ,&nbsp;Yi Liu","doi":"10.1016/j.energy.2025.136708","DOIUrl":null,"url":null,"abstract":"<div><div>Hydrothermal bio-oil is a promising fuel source to effectively alleviate the energy crisis and contribute to carbon neutrality. However, the application performance of hydrothermal bio-oil is primarily affected by its yield, compositions (C, H, O, N, and S contents), and higher heating value (HHV). Hence, we proposed an interpretable machine learning model for hydrothermal bio-oil property prediction. Graph neural network (GNN)-based molecular structure was utilized to enhance datasets. Single- and multi-task prediction models were established by extreme gradient boosting (XGB), random forest (RF), gradient boosting decision tree (GBDT), and deep neural network (DNN). Shapely values and partial dependence were utilized to interpret the impact of input features on output responses. Results indicated that XGB model (average train R<sup>2</sup> = 0.95, RMSE = 1.29, MAE = 0.69, average test R<sup>2</sup> = 0.91, RMSE = 1.30, MAE = 0.93) performed the most optimally after applying GNN, with an average performance improvement of 6.74–7.95 %. The model demonstrated outstanding performance on unknown data, achieving an average R<sup>2</sup> of 0.916. Temperature and biomass ultimate analysis emerged as pivotal features influencing output. Triglycerides exhibited a stronger influence than fatty acids owing to their higher carbon content. A combination of high temperature (&gt;350 °C) and elevated triglyceride content (&gt;30 %) increased bio-oil yield (∼40 wt%) and HHV (∼35 MJ/kg).</div></div>","PeriodicalId":11647,"journal":{"name":"Energy","volume":"329 ","pages":"Article 136708"},"PeriodicalIF":9.0000,"publicationDate":"2025-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Interpretable machine learning model for bio-oil property prediction from hydrothermal liquefaction of biomass via graph neural networks-based molecular structure\",\"authors\":\"Yue Tian ,&nbsp;Junghwan Kim ,&nbsp;Yi Liu\",\"doi\":\"10.1016/j.energy.2025.136708\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Hydrothermal bio-oil is a promising fuel source to effectively alleviate the energy crisis and contribute to carbon neutrality. However, the application performance of hydrothermal bio-oil is primarily affected by its yield, compositions (C, H, O, N, and S contents), and higher heating value (HHV). Hence, we proposed an interpretable machine learning model for hydrothermal bio-oil property prediction. Graph neural network (GNN)-based molecular structure was utilized to enhance datasets. Single- and multi-task prediction models were established by extreme gradient boosting (XGB), random forest (RF), gradient boosting decision tree (GBDT), and deep neural network (DNN). Shapely values and partial dependence were utilized to interpret the impact of input features on output responses. Results indicated that XGB model (average train R<sup>2</sup> = 0.95, RMSE = 1.29, MAE = 0.69, average test R<sup>2</sup> = 0.91, RMSE = 1.30, MAE = 0.93) performed the most optimally after applying GNN, with an average performance improvement of 6.74–7.95 %. The model demonstrated outstanding performance on unknown data, achieving an average R<sup>2</sup> of 0.916. Temperature and biomass ultimate analysis emerged as pivotal features influencing output. Triglycerides exhibited a stronger influence than fatty acids owing to their higher carbon content. A combination of high temperature (&gt;350 °C) and elevated triglyceride content (&gt;30 %) increased bio-oil yield (∼40 wt%) and HHV (∼35 MJ/kg).</div></div>\",\"PeriodicalId\":11647,\"journal\":{\"name\":\"Energy\",\"volume\":\"329 \",\"pages\":\"Article 136708\"},\"PeriodicalIF\":9.0000,\"publicationDate\":\"2025-05-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Energy\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0360544225023503\",\"RegionNum\":1,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"ENERGY & FUELS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Energy","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0360544225023503","RegionNum":1,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENERGY & FUELS","Score":null,"Total":0}
引用次数: 0

摘要

热液生物油是一种很有前途的燃料来源,可以有效缓解能源危机,促进碳中和。而水热生物油的应用性能主要受其产率、组分(C、H、O、N、S)和较高的热值(HHV)的影响。因此,我们提出了一种可解释的热液生物油性质预测机器学习模型。利用基于图神经网络(GNN)的分子结构对数据集进行增强。采用极端梯度增强(XGB)、随机森林(RF)、梯度增强决策树(GBDT)和深度神经网络(DNN)建立单任务和多任务预测模型。利用形状值和部分依赖来解释输入特征对输出响应的影响。结果表明,应用GNN后,XGB模型(平均训练R2 = 0.95, RMSE = 1.29, MAE = 0.69,平均检验R2 = 0.91, RMSE = 1.30, MAE = 0.93)的性能最优,平均性能提高6.74 ~ 7.95%。该模型在未知数据上表现出色,平均R2为0.916。温度和生物量最终分析成为影响产量的关键特征。甘油三酯的含碳量较高,因此比脂肪酸的影响更大。高温(350°C)和甘油三酯含量升高(30%)的组合提高了生物油产量(~ 40 wt%)和HHV (~ 35 MJ/kg)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Interpretable machine learning model for bio-oil property prediction from hydrothermal liquefaction of biomass via graph neural networks-based molecular structure
Hydrothermal bio-oil is a promising fuel source to effectively alleviate the energy crisis and contribute to carbon neutrality. However, the application performance of hydrothermal bio-oil is primarily affected by its yield, compositions (C, H, O, N, and S contents), and higher heating value (HHV). Hence, we proposed an interpretable machine learning model for hydrothermal bio-oil property prediction. Graph neural network (GNN)-based molecular structure was utilized to enhance datasets. Single- and multi-task prediction models were established by extreme gradient boosting (XGB), random forest (RF), gradient boosting decision tree (GBDT), and deep neural network (DNN). Shapely values and partial dependence were utilized to interpret the impact of input features on output responses. Results indicated that XGB model (average train R2 = 0.95, RMSE = 1.29, MAE = 0.69, average test R2 = 0.91, RMSE = 1.30, MAE = 0.93) performed the most optimally after applying GNN, with an average performance improvement of 6.74–7.95 %. The model demonstrated outstanding performance on unknown data, achieving an average R2 of 0.916. Temperature and biomass ultimate analysis emerged as pivotal features influencing output. Triglycerides exhibited a stronger influence than fatty acids owing to their higher carbon content. A combination of high temperature (>350 °C) and elevated triglyceride content (>30 %) increased bio-oil yield (∼40 wt%) and HHV (∼35 MJ/kg).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Energy
Energy 工程技术-能源与燃料
CiteScore
15.30
自引率
14.40%
发文量
0
审稿时长
14.2 weeks
期刊介绍: Energy is a multidisciplinary, international journal that publishes research and analysis in the field of energy engineering. Our aim is to become a leading peer-reviewed platform and a trusted source of information for energy-related topics. The journal covers a range of areas including mechanical engineering, thermal sciences, and energy analysis. We are particularly interested in research on energy modelling, prediction, integrated energy systems, planning, and management. Additionally, we welcome papers on energy conservation, efficiency, biomass and bioenergy, renewable energy, electricity supply and demand, energy storage, buildings, and economic and policy issues. These topics should align with our broader multidisciplinary focus.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信