Self-supervised graph neural networks for polymer property prediction

IF 3.2 3区 工程技术 Q2 CHEMISTRY, PHYSICAL
Qinghe Gao, Tammo Dukker, Artur M. Schweidtmann, Jana M. Weber
{"title":"Self-supervised graph neural networks for polymer property prediction","authors":"Qinghe Gao, Tammo Dukker, Artur M. Schweidtmann, Jana M. Weber","doi":"10.1039/d4me00088a","DOIUrl":null,"url":null,"abstract":"The estimation of polymer properties is of crucial importance in many domains such as energy, healthcare, and packaging. Recently, graph neural networks (GNNs) have shown promising results for the prediction of polymer properties based on supervised learning. However, the training of GNNs in a supervised learning task demands a huge amount of polymer property data that is time-consuming and computationally/experimentally expensive to obtain. Self-supervised learning offers great potential to reduce this data demand through pre-training the GNNs on polymer structure data only. These pre-trained GNNs can then be fine-tuned on the supervised property prediction task using a much smaller labeled dataset. We propose to leverage self-supervised learning techniques in GNNs for the prediction of polymer properties. We employ a recent polymer graph representation that includes essential features of polymers, such as monomer combinations, stochastic chain architecture, and monomer stoichiometry, and process the polymer graphs through a tailored GNN architecture. We investigate three self-supervised learning setups: (i) node- and edge-level pre-training, (ii) graph-level pre-training, and (iii) ensembled node-, edge- & graph-level pre-training. We additionally explore three different transfer strategies of fully connected layers with the GNN architecture. Our results indicate that the ensemble node-, edge- & graph-level self-supervised learning with all layers transferred depicts the best performance across dataset size. In scarce data scenarios, it decreases the root mean square errors by 28.39% and 19.09% for the prediction of electron affinity and ionization potential compared to supervised learning without the pre-training task.","PeriodicalId":91,"journal":{"name":"Molecular Systems Design & Engineering","volume":null,"pages":null},"PeriodicalIF":3.2000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Systems Design & Engineering","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1039/d4me00088a","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

The estimation of polymer properties is of crucial importance in many domains such as energy, healthcare, and packaging. Recently, graph neural networks (GNNs) have shown promising results for the prediction of polymer properties based on supervised learning. However, the training of GNNs in a supervised learning task demands a huge amount of polymer property data that is time-consuming and computationally/experimentally expensive to obtain. Self-supervised learning offers great potential to reduce this data demand through pre-training the GNNs on polymer structure data only. These pre-trained GNNs can then be fine-tuned on the supervised property prediction task using a much smaller labeled dataset. We propose to leverage self-supervised learning techniques in GNNs for the prediction of polymer properties. We employ a recent polymer graph representation that includes essential features of polymers, such as monomer combinations, stochastic chain architecture, and monomer stoichiometry, and process the polymer graphs through a tailored GNN architecture. We investigate three self-supervised learning setups: (i) node- and edge-level pre-training, (ii) graph-level pre-training, and (iii) ensembled node-, edge- & graph-level pre-training. We additionally explore three different transfer strategies of fully connected layers with the GNN architecture. Our results indicate that the ensemble node-, edge- & graph-level self-supervised learning with all layers transferred depicts the best performance across dataset size. In scarce data scenarios, it decreases the root mean square errors by 28.39% and 19.09% for the prediction of electron affinity and ionization potential compared to supervised learning without the pre-training task.

Abstract Image

用于聚合物性能预测的自监督图神经网络
聚合物特性的估算在能源、医疗保健和包装等许多领域都至关重要。最近,图神经网络(GNN)在基于监督学习的聚合物特性预测方面取得了可喜的成果。然而,在监督学习任务中训练 GNNs 需要大量聚合物属性数据,获取这些数据既耗时又耗费计算/实验成本。自监督学习通过仅在聚合物结构数据上预训练 GNN,为减少数据需求提供了巨大潜力。然后,这些经过预训练的 GNN 可以使用更小的标注数据集在有监督的属性预测任务中进行微调。我们建议利用 GNN 中的自监督学习技术来预测聚合物特性。我们采用了最新的聚合物图表示法,其中包括聚合物的基本特征,如单体组合、随机链结构和单体计量学,并通过定制的 GNN 架构处理聚合物图。我们研究了三种自我监督学习设置:(i) 节点和边缘级预训练;(ii) 图级预训练;(iii) 节点、边缘和amp;图级集合预训练。此外,我们还探索了全连接层与 GNN 架构的三种不同传输策略。我们的结果表明,在不同数据集大小的情况下,所有层都转移的集合节点、边缘和图形级自监督学习的性能最佳。在数据稀缺的情况下,与没有预训练任务的监督学习相比,它在预测电子亲和力和电离电位方面的均方根误差分别降低了 28.39% 和 19.09%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Molecular Systems Design & Engineering
Molecular Systems Design & Engineering Engineering-Biomedical Engineering
CiteScore
6.40
自引率
2.80%
发文量
144
期刊介绍: Molecular Systems Design & Engineering provides a hub for cutting-edge research into how understanding of molecular properties, behaviour and interactions can be used to design and assemble better materials, systems, and processes to achieve specific functions. These may have applications of technological significance and help address global challenges.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信