Self-supervised graph neural networks for polymer property prediction†

IF 3.2 3区工程技术 Q2 CHEMISTRY, PHYSICAL

Molecular Systems Design & Engineering Pub Date : 2024-08-29 DOI:10.1039/D4ME00088A

Qinghe Gao, Tammo Dukker, Artur M. Schweidtmann and Jana M. Weber

{"title":"Self-supervised graph neural networks for polymer property prediction†","authors":"Qinghe Gao, Tammo Dukker, Artur M. Schweidtmann and Jana M. Weber","doi":"10.1039/D4ME00088A","DOIUrl":null,"url":null,"abstract":"<p >The estimation of polymer properties is of crucial importance in many domains such as energy, healthcare, and packaging. Recently, graph neural networks (GNNs) have shown promising results for the prediction of polymer properties based on supervised learning. However, the training of GNNs in a supervised learning task demands a huge amount of polymer property data that is time-consuming and computationally/experimentally expensive to obtain. Self-supervised learning offers great potential to reduce this data demand through pre-training the GNNs on polymer structure data only. These pre-trained GNNs can then be fine-tuned on the supervised property prediction task using a much smaller labeled dataset. We propose to leverage self-supervised learning techniques in GNNs for the prediction of polymer properties. We employ a recent polymer graph representation that includes essential features of polymers, such as monomer combinations, stochastic chain architecture, and monomer stoichiometry, and process the polymer graphs through a tailored GNN architecture. We investigate three self-supervised learning setups: (i) node- and edge-level pre-training, (ii) graph-level pre-training, and (iii) ensembled node-, edge- & graph-level pre-training. We additionally explore three different transfer strategies of fully connected layers with the GNN architecture. Our results indicate that the ensemble node-, edge- & graph-level self-supervised learning with all layers transferred depicts the best performance across dataset size. In scarce data scenarios, it decreases the root mean square errors by 28.39% and 19.09% for the prediction of electron affinity and ionization potential compared to supervised learning without the pre-training task.</p>","PeriodicalId":91,"journal":{"name":"Molecular Systems Design & Engineering","volume":" 11","pages":" 1130-1143"},"PeriodicalIF":3.2000,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://pubs.rsc.org/en/content/articlepdf/2024/me/d4me00088a?page=search","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Molecular Systems Design & Engineering","FirstCategoryId":"5","ListUrlMain":"https://pubs.rsc.org/en/content/articlelanding/2024/me/d4me00088a","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

Abstract

The estimation of polymer properties is of crucial importance in many domains such as energy, healthcare, and packaging. Recently, graph neural networks (GNNs) have shown promising results for the prediction of polymer properties based on supervised learning. However, the training of GNNs in a supervised learning task demands a huge amount of polymer property data that is time-consuming and computationally/experimentally expensive to obtain. Self-supervised learning offers great potential to reduce this data demand through pre-training the GNNs on polymer structure data only. These pre-trained GNNs can then be fine-tuned on the supervised property prediction task using a much smaller labeled dataset. We propose to leverage self-supervised learning techniques in GNNs for the prediction of polymer properties. We employ a recent polymer graph representation that includes essential features of polymers, such as monomer combinations, stochastic chain architecture, and monomer stoichiometry, and process the polymer graphs through a tailored GNN architecture. We investigate three self-supervised learning setups: (i) node- and edge-level pre-training, (ii) graph-level pre-training, and (iii) ensembled node-, edge- & graph-level pre-training. We additionally explore three different transfer strategies of fully connected layers with the GNN architecture. Our results indicate that the ensemble node-, edge- & graph-level self-supervised learning with all layers transferred depicts the best performance across dataset size. In scarce data scenarios, it decreases the root mean square errors by 28.39% and 19.09% for the prediction of electron affinity and ionization potential compared to supervised learning without the pre-training task.

Abstract Image

查看原文本刊更多论文

用于聚合物性能预测的自监督图神经网络

聚合物特性的估算在能源、医疗保健和包装等许多领域都至关重要。最近，图神经网络（GNN）在基于监督学习的聚合物特性预测方面取得了可喜的成果。然而，在监督学习任务中训练 GNNs 需要大量聚合物属性数据，获取这些数据既耗时又耗费计算/实验成本。自监督学习通过仅在聚合物结构数据上预训练 GNN，为减少数据需求提供了巨大潜力。然后，这些经过预训练的 GNN 可以使用更小的标注数据集在有监督的属性预测任务中进行微调。我们建议利用 GNN 中的自监督学习技术来预测聚合物特性。我们采用了最新的聚合物图表示法，其中包括聚合物的基本特征，如单体组合、随机链结构和单体计量学，并通过定制的 GNN 架构处理聚合物图。我们研究了三种自我监督学习设置：(i) 节点和边缘级预训练；(ii) 图级预训练；(iii) 节点、边缘和amp；图级集合预训练。此外，我们还探索了全连接层与 GNN 架构的三种不同传输策略。我们的结果表明，在不同数据集大小的情况下，所有层都转移的集合节点、边缘和图形级自监督学习的性能最佳。在数据稀缺的情况下，与没有预训练任务的监督学习相比，它在预测电子亲和力和电离电位方面的均方根误差分别降低了 28.39% 和 19.09%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Molecular Systems Design & Engineering Engineering-Biomedical Engineering

CiteScore

6.40

自引率

2.80%

发文量

144

期刊介绍： Molecular Systems Design & Engineering provides a hub for cutting-edge research into how understanding of molecular properties, behaviour and interactions can be used to design and assemble better materials, systems, and processes to achieve specific functions. These may have applications of technological significance and help address global challenges.