DenseGNN: universal and scalable deeper graph neural networks for high-performance property prediction in crystals and molecules

IF 9.4 1区材料科学 Q1 CHEMISTRY, PHYSICAL

npj Computational Materials Pub Date : 2024-12-19 DOI:10.1038/s41524-024-01444-x

Hongwei Du, Jiamin Wang, Jian Hui, Lanting Zhang, Hong Wang

{"title":"DenseGNN: universal and scalable deeper graph neural networks for high-performance property prediction in crystals and molecules","authors":"Hongwei Du, Jiamin Wang, Jian Hui, Lanting Zhang, Hong Wang","doi":"10.1038/s41524-024-01444-x","DOIUrl":null,"url":null,"abstract":"<p>Modern generative models based on deep learning have made it possible to design millions of hypothetical materials. To screen these candidate materials and identify promising new materials, we need fast and accurate models to predict material properties. Graphical neural networks (GNNs) have become a current research focus due to their ability to directly act on the graphical representation of molecules and materials, enabling comprehensive capture of important information and showing excellent performance in predicting material properties. Nevertheless, GNNs still face several key problems in practical applications: First, although existing nested graph network strategies increase critical structural information such as bond angles, they significantly increase the number of trainable parameters in the model, resulting in a increase in training costs; Second, extending GNN models to broader domains such as molecules, crystalline materials, and catalysis, as well as adapting to small data sets, remains a challenge. Finally, the scalability of GNN models is limited by the over-smoothing problem. To address these issues, we propose the DenseGNN model, which combines Dense Connectivity Network (DCN), hierarchical node-edge-graph residual networks (HRN), and Local Structure Order Parameters Embedding (LOPE) strategies to create a universal, scalable, and efficient GNN model. We have achieved state-of-the-art performance (SOAT) on several datasets, including JARVIS-DFT, Materials Project, QM9, Lipop, FreeSolv, ESOL, and OC22, demonstrating the generality and scalability of our approach. By merging DCN and LOPE strategies into GNN models in computing, crystal materials, and molecules, we have improved the performance of models such as GIN, Schnet, and Hamnet on materials datasets such as Matbench. The LOPE strategy optimizes the embedding representation of atoms and allows our model to train efficiently with a minimal level of edge connections. This substantially reduces computational costs and shortens the time required to train large GNNs while maintaining accuracy. Our technique not only supports building deeper GNNs and avoids performance penalties experienced by other models, but is also applicable to a variety of applications that require large deep learning models. Furthermore, our study demonstrates that by using structural embeddings from pre-trained models, our model not only outperforms other GNNs in distinguishing crystal structures but also approaches the standard X-ray diffraction (XRD) method.</p>","PeriodicalId":19342,"journal":{"name":"npj Computational Materials","volume":"27 1","pages":""},"PeriodicalIF":9.4000,"publicationDate":"2024-12-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"npj Computational Materials","FirstCategoryId":"88","ListUrlMain":"https://doi.org/10.1038/s41524-024-01444-x","RegionNum":1,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}

引用次数: 0

Abstract

Modern generative models based on deep learning have made it possible to design millions of hypothetical materials. To screen these candidate materials and identify promising new materials, we need fast and accurate models to predict material properties. Graphical neural networks (GNNs) have become a current research focus due to their ability to directly act on the graphical representation of molecules and materials, enabling comprehensive capture of important information and showing excellent performance in predicting material properties. Nevertheless, GNNs still face several key problems in practical applications: First, although existing nested graph network strategies increase critical structural information such as bond angles, they significantly increase the number of trainable parameters in the model, resulting in a increase in training costs; Second, extending GNN models to broader domains such as molecules, crystalline materials, and catalysis, as well as adapting to small data sets, remains a challenge. Finally, the scalability of GNN models is limited by the over-smoothing problem. To address these issues, we propose the DenseGNN model, which combines Dense Connectivity Network (DCN), hierarchical node-edge-graph residual networks (HRN), and Local Structure Order Parameters Embedding (LOPE) strategies to create a universal, scalable, and efficient GNN model. We have achieved state-of-the-art performance (SOAT) on several datasets, including JARVIS-DFT, Materials Project, QM9, Lipop, FreeSolv, ESOL, and OC22, demonstrating the generality and scalability of our approach. By merging DCN and LOPE strategies into GNN models in computing, crystal materials, and molecules, we have improved the performance of models such as GIN, Schnet, and Hamnet on materials datasets such as Matbench. The LOPE strategy optimizes the embedding representation of atoms and allows our model to train efficiently with a minimal level of edge connections. This substantially reduces computational costs and shortens the time required to train large GNNs while maintaining accuracy. Our technique not only supports building deeper GNNs and avoids performance penalties experienced by other models, but is also applicable to a variety of applications that require large deep learning models. Furthermore, our study demonstrates that by using structural embeddings from pre-trained models, our model not only outperforms other GNNs in distinguishing crystal structures but also approaches the standard X-ray diffraction (XRD) method.

Abstract Image

查看原文本刊更多论文

DenseGNN：用于晶体和分子高性能性质预测的通用、可扩展的深度图神经网络

基于深度学习的现代生成模型使设计数百万种假设材料成为可能。为了筛选这些候选材料并识别有前途的新材料，我们需要快速准确的模型来预测材料的性能。图形神经网络（gnn）已成为当前的研究热点，因为它们能够直接作用于分子和材料的图形表示，能够全面捕获重要信息，并在预测材料性能方面表现出优异的性能。然而，gnn在实际应用中仍然面临几个关键问题：首先，虽然现有的嵌套图网络策略增加了键角等关键结构信息，但显著增加了模型中可训练参数的数量，导致训练成本增加；其次，将GNN模型扩展到更广泛的领域，如分子、晶体材料和催化，以及适应小数据集，仍然是一个挑战。最后，GNN模型的可扩展性受到过平滑问题的限制。为了解决这些问题，我们提出了DenseGNN模型，该模型结合了密集连接网络（DCN）、分层节点边缘图残差网络（HRN）和局部结构顺序参数嵌入（LOPE）策略，以创建一个通用、可扩展和高效的GNN模型。我们已经在几个数据集上实现了最先进的性能（SOAT），包括JARVIS-DFT、Materials Project、QM9、Lipop、FreeSolv、ESOL和OC22，展示了我们方法的普遍性和可扩展性。通过将DCN和LOPE策略合并到计算、晶体材料和分子的GNN模型中，我们提高了GIN、Schnet和Hamnet等模型在材料数据集（如Matbench）上的性能。LOPE策略优化了原子的嵌入表示，并允许我们的模型以最小的边缘连接进行有效的训练。这大大降低了计算成本，缩短了训练大型gnn所需的时间，同时保持了准确性。我们的技术不仅支持构建更深层次的gnn，避免了其他模型所经历的性能损失，而且也适用于需要大型深度学习模型的各种应用。此外，我们的研究表明，通过使用预训练模型的结构嵌入，我们的模型不仅在区分晶体结构方面优于其他gnn，而且接近标准x射线衍射（XRD）方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

npj Computational Materials Mathematics-Modeling and Simulation

CiteScore

15.30

自引率

5.20%

发文量

229

审稿时长

6 weeks

期刊介绍： npj Computational Materials is a high-quality open access journal from Nature Research that publishes research papers applying computational approaches for the design of new materials and enhancing our understanding of existing ones. The journal also welcomes papers on new computational techniques and the refinement of current approaches that support these aims, as well as experimental papers that complement computational findings. Some key features of npj Computational Materials include a 2-year impact factor of 12.241 (2021), article downloads of 1,138,590 (2021), and a fast turnaround time of 11 days from submission to the first editorial decision. The journal is indexed in various databases and services, including Chemical Abstracts Service (ACS), Astrophysics Data System (ADS), Current Contents/Physical, Chemical and Earth Sciences, Journal Citation Reports/Science Edition, SCOPUS, EI Compendex, INSPEC, Google Scholar, SCImago, DOAJ, CNKI, and Science Citation Index Expanded (SCIE), among others.