Q-DFTNet：利用dft驱动的QM9数据预测分子偶极矩的化学信息神经网络框架

IF 4.8 3区化学 Q2 CHEMISTRY, MULTIDISCIPLINARY

Journal of Computational Chemistry Pub Date : 2025-08-13 DOI:10.1002/jcc.70206

Dennis Delali Kwesi Wayo, Mohd Zulkifli Bin Mohamad Noor, Masoud Darvish Ganji, Camila Martins Saporetti, Leonardo Goliatt

{"title":"Q-DFTNet：利用dft驱动的QM9数据预测分子偶极矩的化学信息神经网络框架","authors":"Dennis Delali Kwesi Wayo, Mohd Zulkifli Bin Mohamad Noor, Masoud Darvish Ganji, Camila Martins Saporetti, Leonardo Goliatt","doi":"10.1002/jcc.70206","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>This study presents Q-DFTNet, a chemistry-informed neural network (ChINN) framework designed to benchmark graph neural networks (GNNs) for dipole moment prediction using the QM9 dataset. Seven GNN architectures, GCN, GIN, GraphConv, GATConv, GATNet, SAGEConv, and GIN+EdgeConv, were trained for 100 epochs and evaluated across performance and interpretability metrics. GraphConv achieved the lowest test MSE (0.7054), MAE (0.6196), and the highest <span></span><math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>R</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n </mrow>\n <annotation>$$ {R}^2 $$</annotation>\n </semantics></math> (0.6513) with only 16.5k trainable parameters, confirming its optimal accuracy-complexity trade-off. GIN+EdgeConv followed closely with MSE of 0.7386, MAE of 0.6332, and <span></span><math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>R</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n </mrow>\n <annotation>$$ {R}^2 $$</annotation>\n </semantics></math> of 0.6349, leveraging edge-awareness for enhanced expressivity. In contrast, attention-based models like GATConv and GATNet underperformed, with test MSEs of 0.9667 and 1.0096, and <span></span><math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>R</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n </mrow>\n <annotation>$$ {R}^2 $$</annotation>\n </semantics></math> values of 0.5221 and 0.5009, despite their higher complexity (43.5k and 37.3k parameters). Latent space analysis via t-SNE, PCA, and UMAP showed superior cluster separability for GraphConv, GIN+EdgeConv, and GCN. Clustering metrics corroborated these observations: GraphConv achieved a Silhouette Score of 0.4665, a Davies–Bouldin Index of 0.7111, and a Calinski–Harabasz Score of 1278.40. Cluster-wise molecular dipole means for GIN+EdgeConv ranged from 2.6221 to 2.9606 Debye, reflecting high semantic coherence. Residual analysis and QQ plots confirmed that models with lower MSEs also had near-Gaussian error distributions, enhancing interpretability. Compared to benchmark models like PhysNet and DimeNet++, Q-DFTNet offers lower absolute accuracy but excels in modularity, interpretability, and computational efficiency. For a chemically grounded baseline for deploying GNNs in quantum chemistry and materials discovery pipelines, Q-DFTNet is proposed.</p>\n </div>","PeriodicalId":188,"journal":{"name":"Journal of Computational Chemistry","volume":"46 22","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Q-DFTNet: A Chemistry-Informed Neural Network Framework for Predicting Molecular Dipole Moments via DFT-Driven QM9 Data\",\"authors\":\"Dennis Delali Kwesi Wayo, Mohd Zulkifli Bin Mohamad Noor, Masoud Darvish Ganji, Camila Martins Saporetti, Leonardo Goliatt\",\"doi\":\"10.1002/jcc.70206\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>This study presents Q-DFTNet, a chemistry-informed neural network (ChINN) framework designed to benchmark graph neural networks (GNNs) for dipole moment prediction using the QM9 dataset. Seven GNN architectures, GCN, GIN, GraphConv, GATConv, GATNet, SAGEConv, and GIN+EdgeConv, were trained for 100 epochs and evaluated across performance and interpretability metrics. GraphConv achieved the lowest test MSE (0.7054), MAE (0.6196), and the highest <span></span><math>\\n <semantics>\\n <mrow>\\n <msup>\\n <mrow>\\n <mi>R</mi>\\n </mrow>\\n <mrow>\\n <mn>2</mn>\\n </mrow>\\n </msup>\\n </mrow>\\n <annotation>$$ {R}^2 $$</annotation>\\n </semantics></math> (0.6513) with only 16.5k trainable parameters, confirming its optimal accuracy-complexity trade-off. GIN+EdgeConv followed closely with MSE of 0.7386, MAE of 0.6332, and <span></span><math>\\n <semantics>\\n <mrow>\\n <msup>\\n <mrow>\\n <mi>R</mi>\\n </mrow>\\n <mrow>\\n <mn>2</mn>\\n </mrow>\\n </msup>\\n </mrow>\\n <annotation>$$ {R}^2 $$</annotation>\\n </semantics></math> of 0.6349, leveraging edge-awareness for enhanced expressivity. In contrast, attention-based models like GATConv and GATNet underperformed, with test MSEs of 0.9667 and 1.0096, and <span></span><math>\\n <semantics>\\n <mrow>\\n <msup>\\n <mrow>\\n <mi>R</mi>\\n </mrow>\\n <mrow>\\n <mn>2</mn>\\n </mrow>\\n </msup>\\n </mrow>\\n <annotation>$$ {R}^2 $$</annotation>\\n </semantics></math> values of 0.5221 and 0.5009, despite their higher complexity (43.5k and 37.3k parameters). Latent space analysis via t-SNE, PCA, and UMAP showed superior cluster separability for GraphConv, GIN+EdgeConv, and GCN. Clustering metrics corroborated these observations: GraphConv achieved a Silhouette Score of 0.4665, a Davies–Bouldin Index of 0.7111, and a Calinski–Harabasz Score of 1278.40. Cluster-wise molecular dipole means for GIN+EdgeConv ranged from 2.6221 to 2.9606 Debye, reflecting high semantic coherence. Residual analysis and QQ plots confirmed that models with lower MSEs also had near-Gaussian error distributions, enhancing interpretability. Compared to benchmark models like PhysNet and DimeNet++, Q-DFTNet offers lower absolute accuracy but excels in modularity, interpretability, and computational efficiency. For a chemically grounded baseline for deploying GNNs in quantum chemistry and materials discovery pipelines, Q-DFTNet is proposed.</p>\\n </div>\",\"PeriodicalId\":188,\"journal\":{\"name\":\"Journal of Computational Chemistry\",\"volume\":\"46 22\",\"pages\":\"\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computational Chemistry\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/jcc.70206\",\"RegionNum\":3,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Chemistry","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jcc.70206","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}

引用次数: 0

摘要

本研究提出了Q-DFTNet，这是一个化学信息神经网络（ChINN）框架，旨在使用QM9数据集对偶极矩预测的图形神经网络（gnn）进行基准测试。GCN、GIN、GraphConv、GATConv、GATNet、SAGEConv和GIN+EdgeConv这七种GNN架构进行了100次epoch的训练，并对性能和可解释性指标进行了评估。GraphConv仅使用16.5k可训练参数就获得了最低的测试MSE (0.7054)， MAE（0.6196）和最高的r2 $$ {R}^2 $$(0.6513)，证实了其最佳的准确性-复杂性权衡。GIN+EdgeConv紧随其后，MSE为0.7386，MAE为0.6332,r2 $$ {R}^2 $$为0.6349，利用边缘感知增强表达性。相比之下，GATConv和GATNet等基于注意力的模型表现不佳，尽管它们的复杂度更高（43.5k和37.3k参数），但测试mse为0.9667和1.0096,r2 $$ {R}^2 $$值为0.5221和0.5009。通过t-SNE、PCA和UMAP进行的潜在空间分析显示，GraphConv、GIN+EdgeConv和GCN具有较好的聚类可分离性。聚类指标证实了这些观察结果：GraphConv的Silhouette Score为0.4665，Davies-Bouldin Index为0.7111，Calinski-Harabasz Score为1278.40。GIN+EdgeConv的聚簇分子偶极子均值在2.6221 ～ 2.9606 Debye之间，反映了较高的语义相干性。残差分析和QQ图证实，mse较低的模型也具有近高斯误差分布，可解释性增强。与PhysNet和DimeNet++等基准模型相比，Q-DFTNet的绝对精度较低，但在模块化、可解释性和计算效率方面表现出色。为了在量子化学和材料发现管道中部署gnn的化学接地基线，提出了Q-DFTNet。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Q-DFTNet: A Chemistry-Informed Neural Network Framework for Predicting Molecular Dipole Moments via DFT-Driven QM9 Data

查看原文本刊更多论文

Q-DFTNet: A Chemistry-Informed Neural Network Framework for Predicting Molecular Dipole Moments via DFT-Driven QM9 Data

This study presents Q-DFTNet, a chemistry-informed neural network (ChINN) framework designed to benchmark graph neural networks (GNNs) for dipole moment prediction using the QM9 dataset. Seven GNN architectures, GCN, GIN, GraphConv, GATConv, GATNet, SAGEConv, and GIN+EdgeConv, were trained for 100 epochs and evaluated across performance and interpretability metrics. GraphConv achieved the lowest test MSE (0.7054), MAE (0.6196), and the highest $R^{2}$ (0.6513) with only 16.5k trainable parameters, confirming its optimal accuracy-complexity trade-off. GIN+EdgeConv followed closely with MSE of 0.7386, MAE of 0.6332, and $R^{2}$ of 0.6349, leveraging edge-awareness for enhanced expressivity. In contrast, attention-based models like GATConv and GATNet underperformed, with test MSEs of 0.9667 and 1.0096, and $R^{2}$ values of 0.5221 and 0.5009, despite their higher complexity (43.5k and 37.3k parameters). Latent space analysis via t-SNE, PCA, and UMAP showed superior cluster separability for GraphConv, GIN+EdgeConv, and GCN. Clustering metrics corroborated these observations: GraphConv achieved a Silhouette Score of 0.4665, a Davies–Bouldin Index of 0.7111, and a Calinski–Harabasz Score of 1278.40. Cluster-wise molecular dipole means for GIN+EdgeConv ranged from 2.6221 to 2.9606 Debye, reflecting high semantic coherence. Residual analysis and QQ plots confirmed that models with lower MSEs also had near-Gaussian error distributions, enhancing interpretability. Compared to benchmark models like PhysNet and DimeNet++, Q-DFTNet offers lower absolute accuracy but excels in modularity, interpretability, and computational efficiency. For a chemically grounded baseline for deploying GNNs in quantum chemistry and materials discovery pipelines, Q-DFTNet is proposed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Computational Chemistry 化学-化学综合

CiteScore

6.60

自引率

3.30%

发文量

247

审稿时长

1.7 months

期刊介绍： This distinguished journal publishes articles concerned with all aspects of computational chemistry: analytical, biological, inorganic, organic, physical, and materials. The Journal of Computational Chemistry presents original research, contemporary developments in theory and methodology, and state-of-the-art applications. Computational areas that are featured in the journal include ab initio and semiempirical quantum mechanics, density functional theory, molecular mechanics, molecular dynamics, statistical mechanics, cheminformatics, biomolecular structure prediction, molecular design, and bioinformatics.