使用机器学习方法与酸预处理玉米秸秆培养基组成的分子描述符相关

IF 8.3 2区 工程技术 Q1 CHEMISTRY, PHYSICAL
Xiyue Zhang , Yixiao Wang , Jing Hu , Qingyue Zhang , Xiaoting Xuan , Lufang Shi , Yong Sun
{"title":"使用机器学习方法与酸预处理玉米秸秆培养基组成的分子描述符相关","authors":"Xiyue Zhang ,&nbsp;Yixiao Wang ,&nbsp;Jing Hu ,&nbsp;Qingyue Zhang ,&nbsp;Xiaoting Xuan ,&nbsp;Lufang Shi ,&nbsp;Yong Sun","doi":"10.1016/j.ijhydene.2025.03.400","DOIUrl":null,"url":null,"abstract":"<div><div>In this work, the machine learning (ML) was used to examine the relationship between physiochemical properties and concentration levels of 50 typical compounds derived from cornstalk acid hydrolysates during lignocellulosic pretreatment. These compounds, selected to represent the chemical matrix (with &lt;32 % similarity), were analyzed using RDKit's MolecularDescriptorCalculator (MDC), which effectively reduced the number of extended-connectivity fingerprints (ECFP4) from 366 chemical descriptors to 19 key descriptors. Notably, compounds such as glucose, fructose, furfural, lactic acid, acetate, formic acid, 4-hydroxy-3-methoxycinnamic acid, and citric acid exhibited consistent hierarchical clustering in cultivation media before (Con_int) and after (Con_aft) fermentation. The chemical descriptors of Gasteiger charge and LogP were effective in illustrating subtle differences for those compounds. The TensorFlow (TF), demonstrated a stronger correlation (R<sup>2</sup>&gt;75 %) between chemical descriptors and pre-fermentation concentrations (Con_int) compared to post-fermentation (Con_aft) from regression model evaluation. SHapley Additive exPlanations (SHAP) analysis was applied using TF algorithm to interpret the chemical properties that influence level of compounds in fermentation cultivation medium, with LogP, Gasteiger charge, and aromatic ring counts being the most influential for Con_int, and Kappa1, radius of gyration, and hydrogen donors for Con_aft. The lignocellulosic acid hydrolysates compounds library (LAHCL) was also constructed for future exploration of potential compounds during biohydrogen fermentation based on cheminformatics study. This cheminformatics approach offers valuable insights into predicting compound concentrations, biological activity and pool of relevant compounds for dark fermentation with reasonable accuracy.</div></div>","PeriodicalId":337,"journal":{"name":"International Journal of Hydrogen Energy","volume":"123 ","pages":""},"PeriodicalIF":8.3000,"publicationDate":"2025-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A molecular descriptor-based correlation with the composition of acid-pretreated cornstalk cultivation medium for biohydrogen production using a machine learning approach\",\"authors\":\"Xiyue Zhang ,&nbsp;Yixiao Wang ,&nbsp;Jing Hu ,&nbsp;Qingyue Zhang ,&nbsp;Xiaoting Xuan ,&nbsp;Lufang Shi ,&nbsp;Yong Sun\",\"doi\":\"10.1016/j.ijhydene.2025.03.400\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In this work, the machine learning (ML) was used to examine the relationship between physiochemical properties and concentration levels of 50 typical compounds derived from cornstalk acid hydrolysates during lignocellulosic pretreatment. These compounds, selected to represent the chemical matrix (with &lt;32 % similarity), were analyzed using RDKit's MolecularDescriptorCalculator (MDC), which effectively reduced the number of extended-connectivity fingerprints (ECFP4) from 366 chemical descriptors to 19 key descriptors. Notably, compounds such as glucose, fructose, furfural, lactic acid, acetate, formic acid, 4-hydroxy-3-methoxycinnamic acid, and citric acid exhibited consistent hierarchical clustering in cultivation media before (Con_int) and after (Con_aft) fermentation. The chemical descriptors of Gasteiger charge and LogP were effective in illustrating subtle differences for those compounds. The TensorFlow (TF), demonstrated a stronger correlation (R<sup>2</sup>&gt;75 %) between chemical descriptors and pre-fermentation concentrations (Con_int) compared to post-fermentation (Con_aft) from regression model evaluation. SHapley Additive exPlanations (SHAP) analysis was applied using TF algorithm to interpret the chemical properties that influence level of compounds in fermentation cultivation medium, with LogP, Gasteiger charge, and aromatic ring counts being the most influential for Con_int, and Kappa1, radius of gyration, and hydrogen donors for Con_aft. The lignocellulosic acid hydrolysates compounds library (LAHCL) was also constructed for future exploration of potential compounds during biohydrogen fermentation based on cheminformatics study. This cheminformatics approach offers valuable insights into predicting compound concentrations, biological activity and pool of relevant compounds for dark fermentation with reasonable accuracy.</div></div>\",\"PeriodicalId\":337,\"journal\":{\"name\":\"International Journal of Hydrogen Energy\",\"volume\":\"123 \",\"pages\":\"\"},\"PeriodicalIF\":8.3000,\"publicationDate\":\"2025-04-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Hydrogen Energy\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0360319925015526\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Hydrogen Energy","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0360319925015526","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

摘要

在这项工作中,使用机器学习(ML)来研究在木质纤维素预处理过程中从玉米秸秆酸水解物中提取的50种典型化合物的理化性质与浓度水平之间的关系。这些化合物被选择来代表化学矩阵(具有32%的相似性),使用RDKit的分子描述计算器(MDC)进行分析,该工具有效地将扩展连接指纹(ECFP4)的数量从366个化学描述符减少到19个关键描述符。值得注意的是,葡萄糖、果糖、糠醛、乳酸、乙酸、甲酸、4-羟基-3-甲氧基肉桂酸和柠檬酸等化合物在发酵前(Con_int)和发酵后(Con_aft)的培养基中表现出一致的分层聚类。Gasteiger电荷和LogP的化学描述符可以有效地说明这些化合物的细微差异。TensorFlow (TF)表明,与回归模型评估的发酵后浓度(Con_aft)相比,化学描述符与发酵前浓度(Con_int)之间的相关性更强(R2> 75%)。SHapley Additive explanation (SHAP)分析使用TF算法来解释影响发酵培养基中化合物水平的化学性质,其中LogP、Gasteiger电荷和芳香环数对Con_int影响最大,Kappa1、旋转半径和供氢体对Con_aft影响最大。构建木质纤维素酸水解产物化合物文库(LAHCL),利用化学信息学方法探索生物氢发酵过程中可能存在的化合物。这种化学信息学方法为预测暗发酵的化合物浓度、生物活性和相关化合物池提供了有价值的见解,具有合理的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A molecular descriptor-based correlation with the composition of acid-pretreated cornstalk cultivation medium for biohydrogen production using a machine learning approach
In this work, the machine learning (ML) was used to examine the relationship between physiochemical properties and concentration levels of 50 typical compounds derived from cornstalk acid hydrolysates during lignocellulosic pretreatment. These compounds, selected to represent the chemical matrix (with <32 % similarity), were analyzed using RDKit's MolecularDescriptorCalculator (MDC), which effectively reduced the number of extended-connectivity fingerprints (ECFP4) from 366 chemical descriptors to 19 key descriptors. Notably, compounds such as glucose, fructose, furfural, lactic acid, acetate, formic acid, 4-hydroxy-3-methoxycinnamic acid, and citric acid exhibited consistent hierarchical clustering in cultivation media before (Con_int) and after (Con_aft) fermentation. The chemical descriptors of Gasteiger charge and LogP were effective in illustrating subtle differences for those compounds. The TensorFlow (TF), demonstrated a stronger correlation (R2>75 %) between chemical descriptors and pre-fermentation concentrations (Con_int) compared to post-fermentation (Con_aft) from regression model evaluation. SHapley Additive exPlanations (SHAP) analysis was applied using TF algorithm to interpret the chemical properties that influence level of compounds in fermentation cultivation medium, with LogP, Gasteiger charge, and aromatic ring counts being the most influential for Con_int, and Kappa1, radius of gyration, and hydrogen donors for Con_aft. The lignocellulosic acid hydrolysates compounds library (LAHCL) was also constructed for future exploration of potential compounds during biohydrogen fermentation based on cheminformatics study. This cheminformatics approach offers valuable insights into predicting compound concentrations, biological activity and pool of relevant compounds for dark fermentation with reasonable accuracy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
International Journal of Hydrogen Energy
International Journal of Hydrogen Energy 工程技术-环境科学
CiteScore
13.50
自引率
25.00%
发文量
3502
审稿时长
60 days
期刊介绍: The objective of the International Journal of Hydrogen Energy is to facilitate the exchange of new ideas, technological advancements, and research findings in the field of Hydrogen Energy among scientists and engineers worldwide. This journal showcases original research, both analytical and experimental, covering various aspects of Hydrogen Energy. These include production, storage, transmission, utilization, enabling technologies, environmental impact, economic considerations, and global perspectives on hydrogen and its carriers such as NH3, CH4, alcohols, etc. The utilization aspect encompasses various methods such as thermochemical (combustion), photochemical, electrochemical (fuel cells), and nuclear conversion of hydrogen, hydrogen isotopes, and hydrogen carriers into thermal, mechanical, and electrical energies. The applications of these energies can be found in transportation (including aerospace), industrial, commercial, and residential sectors.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信