基于分子图的图神经网络环境影响预测

IF 3.9 2区 工程技术 Q2 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Qinghe Gao , Lukas Schulze Balhorn , Alessandro Laera , Raoul Meys , Jonas Goßen , Jana M. Weber , Gregor Wernet , Artur M. Schweidtmann
{"title":"基于分子图的图神经网络环境影响预测","authors":"Qinghe Gao ,&nbsp;Lukas Schulze Balhorn ,&nbsp;Alessandro Laera ,&nbsp;Raoul Meys ,&nbsp;Jonas Goßen ,&nbsp;Jana M. Weber ,&nbsp;Gregor Wernet ,&nbsp;Artur M. Schweidtmann","doi":"10.1016/j.compchemeng.2025.109362","DOIUrl":null,"url":null,"abstract":"<div><div>The chemical industry needs to undergo a significant transformation towards more sustainable and circular production systems. To guide this transformation, estimating the environmental impacts of chemical production at early product screening or development stages is highly desirable. This study leverages the molecular structure of the process products with graph neural networks (GNNs) for early-stage environmental impact approximation of chemical processes. Specifically, we use end-to-end GNN models to predict fifteen environmental impact categories, utilizing a CarbonMinds dataset of 51,905 processes producing 791 molecules produced in 91 countries, augmented with country-specific energy mix data. Our analysis begins with a comparison of Quantitative Structure-Property Relationship (QSPR) and GNN models for the climate change impact category. Specifically, we develop three different GNN models: (i) GNN with only molecular structure, (ii) GNN with molecular structure and additional geographical features, and (iii) GNN with molecular structure and additional energy mix features. The results indicate that the three GNN models show an improvement over the QSPR models. Furthermore, benchmarking our GNN models against the existing literature in the climate change impact category reveals that our models perform comparably. We then extend our approach by developing both single- and multi-task GNN models to predict all fifteen impact categories. The findings indicate that multi-task learning can improve model performance in complex environmental impact predictions compared to single-task GNNs. Therefore, we recommend using a multi-task GNN for predicting multiple impact categories, with single-task models applied to fine-tune performance on underperforming categories. Although our proposed approach shows improvements over previous models, the prediction of environmental impacts solely based on molecular information remains a rough approximation.</div></div>","PeriodicalId":286,"journal":{"name":"Computers & Chemical Engineering","volume":"204 ","pages":"Article 109362"},"PeriodicalIF":3.9000,"publicationDate":"2025-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Environmental impacts prediction using graph neural networks on molecular graphs\",\"authors\":\"Qinghe Gao ,&nbsp;Lukas Schulze Balhorn ,&nbsp;Alessandro Laera ,&nbsp;Raoul Meys ,&nbsp;Jonas Goßen ,&nbsp;Jana M. Weber ,&nbsp;Gregor Wernet ,&nbsp;Artur M. Schweidtmann\",\"doi\":\"10.1016/j.compchemeng.2025.109362\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>The chemical industry needs to undergo a significant transformation towards more sustainable and circular production systems. To guide this transformation, estimating the environmental impacts of chemical production at early product screening or development stages is highly desirable. This study leverages the molecular structure of the process products with graph neural networks (GNNs) for early-stage environmental impact approximation of chemical processes. Specifically, we use end-to-end GNN models to predict fifteen environmental impact categories, utilizing a CarbonMinds dataset of 51,905 processes producing 791 molecules produced in 91 countries, augmented with country-specific energy mix data. Our analysis begins with a comparison of Quantitative Structure-Property Relationship (QSPR) and GNN models for the climate change impact category. Specifically, we develop three different GNN models: (i) GNN with only molecular structure, (ii) GNN with molecular structure and additional geographical features, and (iii) GNN with molecular structure and additional energy mix features. The results indicate that the three GNN models show an improvement over the QSPR models. Furthermore, benchmarking our GNN models against the existing literature in the climate change impact category reveals that our models perform comparably. We then extend our approach by developing both single- and multi-task GNN models to predict all fifteen impact categories. The findings indicate that multi-task learning can improve model performance in complex environmental impact predictions compared to single-task GNNs. Therefore, we recommend using a multi-task GNN for predicting multiple impact categories, with single-task models applied to fine-tune performance on underperforming categories. Although our proposed approach shows improvements over previous models, the prediction of environmental impacts solely based on molecular information remains a rough approximation.</div></div>\",\"PeriodicalId\":286,\"journal\":{\"name\":\"Computers & Chemical Engineering\",\"volume\":\"204 \",\"pages\":\"Article 109362\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-08-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computers & Chemical Engineering\",\"FirstCategoryId\":\"5\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0098135425003655\",\"RegionNum\":2,\"RegionCategory\":\"工程技术\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Chemical Engineering","FirstCategoryId":"5","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0098135425003655","RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

化学工业需要向更可持续和循环的生产系统进行重大转变。为了指导这种转变,在早期产品筛选或开发阶段评估化学品生产对环境的影响是非常可取的。本研究利用过程产品的分子结构与图神经网络(gnn)进行化学过程的早期环境影响近似。具体来说,我们使用端到端GNN模型来预测15个环境影响类别,利用碳头脑的51905个过程的数据集,在91个国家生产791种分子,并辅以国家特定的能源结构数据。我们的分析首先比较了气候变化影响类别的定量结构-属性关系(QSPR)和GNN模型。具体而言,我们开发了三种不同的GNN模型:(i)仅具有分子结构的GNN, (ii)具有分子结构和附加地理特征的GNN,以及(iii)具有分子结构和附加能量混合特征的GNN。结果表明,三种GNN模型均优于QSPR模型。此外,将我们的GNN模型与气候变化影响类别的现有文献进行基准比较,表明我们的模型表现相当。然后,我们通过开发单任务和多任务GNN模型来扩展我们的方法,以预测所有15个影响类别。研究结果表明,与单任务gnn相比,多任务学习可以提高模型在复杂环境影响预测中的性能。因此,我们建议使用多任务GNN来预测多个影响类别,并使用单任务模型对表现不佳的类别进行微调。虽然我们提出的方法比以前的模型有所改进,但仅基于分子信息的环境影响预测仍然是一个粗略的近似。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Environmental impacts prediction using graph neural networks on molecular graphs
The chemical industry needs to undergo a significant transformation towards more sustainable and circular production systems. To guide this transformation, estimating the environmental impacts of chemical production at early product screening or development stages is highly desirable. This study leverages the molecular structure of the process products with graph neural networks (GNNs) for early-stage environmental impact approximation of chemical processes. Specifically, we use end-to-end GNN models to predict fifteen environmental impact categories, utilizing a CarbonMinds dataset of 51,905 processes producing 791 molecules produced in 91 countries, augmented with country-specific energy mix data. Our analysis begins with a comparison of Quantitative Structure-Property Relationship (QSPR) and GNN models for the climate change impact category. Specifically, we develop three different GNN models: (i) GNN with only molecular structure, (ii) GNN with molecular structure and additional geographical features, and (iii) GNN with molecular structure and additional energy mix features. The results indicate that the three GNN models show an improvement over the QSPR models. Furthermore, benchmarking our GNN models against the existing literature in the climate change impact category reveals that our models perform comparably. We then extend our approach by developing both single- and multi-task GNN models to predict all fifteen impact categories. The findings indicate that multi-task learning can improve model performance in complex environmental impact predictions compared to single-task GNNs. Therefore, we recommend using a multi-task GNN for predicting multiple impact categories, with single-task models applied to fine-tune performance on underperforming categories. Although our proposed approach shows improvements over previous models, the prediction of environmental impacts solely based on molecular information remains a rough approximation.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Computers & Chemical Engineering
Computers & Chemical Engineering 工程技术-工程:化工
CiteScore
8.70
自引率
14.00%
发文量
374
审稿时长
70 days
期刊介绍: Computers & Chemical Engineering is primarily a journal of record for new developments in the application of computing and systems technology to chemical engineering problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信