Dennis Delali Kwesi Wayo, Mohd Zulkifli Bin Mohamad Noor, Masoud Darvish Ganji, Camila Martins Saporetti, Leonardo Goliatt
{"title":"Q-DFTNet:利用dft驱动的QM9数据预测分子偶极矩的化学信息神经网络框架","authors":"Dennis Delali Kwesi Wayo, Mohd Zulkifli Bin Mohamad Noor, Masoud Darvish Ganji, Camila Martins Saporetti, Leonardo Goliatt","doi":"10.1002/jcc.70206","DOIUrl":null,"url":null,"abstract":"<div>\n \n <p>This study presents Q-DFTNet, a chemistry-informed neural network (ChINN) framework designed to benchmark graph neural networks (GNNs) for dipole moment prediction using the QM9 dataset. Seven GNN architectures, GCN, GIN, GraphConv, GATConv, GATNet, SAGEConv, and GIN+EdgeConv, were trained for 100 epochs and evaluated across performance and interpretability metrics. GraphConv achieved the lowest test MSE (0.7054), MAE (0.6196), and the highest <span></span><math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>R</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n </mrow>\n <annotation>$$ {R}^2 $$</annotation>\n </semantics></math> (0.6513) with only 16.5k trainable parameters, confirming its optimal accuracy-complexity trade-off. GIN+EdgeConv followed closely with MSE of 0.7386, MAE of 0.6332, and <span></span><math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>R</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n </mrow>\n <annotation>$$ {R}^2 $$</annotation>\n </semantics></math> of 0.6349, leveraging edge-awareness for enhanced expressivity. In contrast, attention-based models like GATConv and GATNet underperformed, with test MSEs of 0.9667 and 1.0096, and <span></span><math>\n <semantics>\n <mrow>\n <msup>\n <mrow>\n <mi>R</mi>\n </mrow>\n <mrow>\n <mn>2</mn>\n </mrow>\n </msup>\n </mrow>\n <annotation>$$ {R}^2 $$</annotation>\n </semantics></math> values of 0.5221 and 0.5009, despite their higher complexity (43.5k and 37.3k parameters). Latent space analysis via t-SNE, PCA, and UMAP showed superior cluster separability for GraphConv, GIN+EdgeConv, and GCN. Clustering metrics corroborated these observations: GraphConv achieved a Silhouette Score of 0.4665, a Davies–Bouldin Index of 0.7111, and a Calinski–Harabasz Score of 1278.40. Cluster-wise molecular dipole means for GIN+EdgeConv ranged from 2.6221 to 2.9606 Debye, reflecting high semantic coherence. Residual analysis and QQ plots confirmed that models with lower MSEs also had near-Gaussian error distributions, enhancing interpretability. Compared to benchmark models like PhysNet and DimeNet++, Q-DFTNet offers lower absolute accuracy but excels in modularity, interpretability, and computational efficiency. For a chemically grounded baseline for deploying GNNs in quantum chemistry and materials discovery pipelines, Q-DFTNet is proposed.</p>\n </div>","PeriodicalId":188,"journal":{"name":"Journal of Computational Chemistry","volume":"46 22","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Q-DFTNet: A Chemistry-Informed Neural Network Framework for Predicting Molecular Dipole Moments via DFT-Driven QM9 Data\",\"authors\":\"Dennis Delali Kwesi Wayo, Mohd Zulkifli Bin Mohamad Noor, Masoud Darvish Ganji, Camila Martins Saporetti, Leonardo Goliatt\",\"doi\":\"10.1002/jcc.70206\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n <p>This study presents Q-DFTNet, a chemistry-informed neural network (ChINN) framework designed to benchmark graph neural networks (GNNs) for dipole moment prediction using the QM9 dataset. Seven GNN architectures, GCN, GIN, GraphConv, GATConv, GATNet, SAGEConv, and GIN+EdgeConv, were trained for 100 epochs and evaluated across performance and interpretability metrics. GraphConv achieved the lowest test MSE (0.7054), MAE (0.6196), and the highest <span></span><math>\\n <semantics>\\n <mrow>\\n <msup>\\n <mrow>\\n <mi>R</mi>\\n </mrow>\\n <mrow>\\n <mn>2</mn>\\n </mrow>\\n </msup>\\n </mrow>\\n <annotation>$$ {R}^2 $$</annotation>\\n </semantics></math> (0.6513) with only 16.5k trainable parameters, confirming its optimal accuracy-complexity trade-off. GIN+EdgeConv followed closely with MSE of 0.7386, MAE of 0.6332, and <span></span><math>\\n <semantics>\\n <mrow>\\n <msup>\\n <mrow>\\n <mi>R</mi>\\n </mrow>\\n <mrow>\\n <mn>2</mn>\\n </mrow>\\n </msup>\\n </mrow>\\n <annotation>$$ {R}^2 $$</annotation>\\n </semantics></math> of 0.6349, leveraging edge-awareness for enhanced expressivity. In contrast, attention-based models like GATConv and GATNet underperformed, with test MSEs of 0.9667 and 1.0096, and <span></span><math>\\n <semantics>\\n <mrow>\\n <msup>\\n <mrow>\\n <mi>R</mi>\\n </mrow>\\n <mrow>\\n <mn>2</mn>\\n </mrow>\\n </msup>\\n </mrow>\\n <annotation>$$ {R}^2 $$</annotation>\\n </semantics></math> values of 0.5221 and 0.5009, despite their higher complexity (43.5k and 37.3k parameters). Latent space analysis via t-SNE, PCA, and UMAP showed superior cluster separability for GraphConv, GIN+EdgeConv, and GCN. Clustering metrics corroborated these observations: GraphConv achieved a Silhouette Score of 0.4665, a Davies–Bouldin Index of 0.7111, and a Calinski–Harabasz Score of 1278.40. Cluster-wise molecular dipole means for GIN+EdgeConv ranged from 2.6221 to 2.9606 Debye, reflecting high semantic coherence. Residual analysis and QQ plots confirmed that models with lower MSEs also had near-Gaussian error distributions, enhancing interpretability. Compared to benchmark models like PhysNet and DimeNet++, Q-DFTNet offers lower absolute accuracy but excels in modularity, interpretability, and computational efficiency. For a chemically grounded baseline for deploying GNNs in quantum chemistry and materials discovery pipelines, Q-DFTNet is proposed.</p>\\n </div>\",\"PeriodicalId\":188,\"journal\":{\"name\":\"Journal of Computational Chemistry\",\"volume\":\"46 22\",\"pages\":\"\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2025-08-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Computational Chemistry\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/jcc.70206\",\"RegionNum\":3,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, MULTIDISCIPLINARY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Computational Chemistry","FirstCategoryId":"92","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jcc.70206","RegionNum":3,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, MULTIDISCIPLINARY","Score":null,"Total":0}
Q-DFTNet: A Chemistry-Informed Neural Network Framework for Predicting Molecular Dipole Moments via DFT-Driven QM9 Data
This study presents Q-DFTNet, a chemistry-informed neural network (ChINN) framework designed to benchmark graph neural networks (GNNs) for dipole moment prediction using the QM9 dataset. Seven GNN architectures, GCN, GIN, GraphConv, GATConv, GATNet, SAGEConv, and GIN+EdgeConv, were trained for 100 epochs and evaluated across performance and interpretability metrics. GraphConv achieved the lowest test MSE (0.7054), MAE (0.6196), and the highest (0.6513) with only 16.5k trainable parameters, confirming its optimal accuracy-complexity trade-off. GIN+EdgeConv followed closely with MSE of 0.7386, MAE of 0.6332, and of 0.6349, leveraging edge-awareness for enhanced expressivity. In contrast, attention-based models like GATConv and GATNet underperformed, with test MSEs of 0.9667 and 1.0096, and values of 0.5221 and 0.5009, despite their higher complexity (43.5k and 37.3k parameters). Latent space analysis via t-SNE, PCA, and UMAP showed superior cluster separability for GraphConv, GIN+EdgeConv, and GCN. Clustering metrics corroborated these observations: GraphConv achieved a Silhouette Score of 0.4665, a Davies–Bouldin Index of 0.7111, and a Calinski–Harabasz Score of 1278.40. Cluster-wise molecular dipole means for GIN+EdgeConv ranged from 2.6221 to 2.9606 Debye, reflecting high semantic coherence. Residual analysis and QQ plots confirmed that models with lower MSEs also had near-Gaussian error distributions, enhancing interpretability. Compared to benchmark models like PhysNet and DimeNet++, Q-DFTNet offers lower absolute accuracy but excels in modularity, interpretability, and computational efficiency. For a chemically grounded baseline for deploying GNNs in quantum chemistry and materials discovery pipelines, Q-DFTNet is proposed.
期刊介绍:
This distinguished journal publishes articles concerned with all aspects of computational chemistry: analytical, biological, inorganic, organic, physical, and materials. The Journal of Computational Chemistry presents original research, contemporary developments in theory and methodology, and state-of-the-art applications. Computational areas that are featured in the journal include ab initio and semiempirical quantum mechanics, density functional theory, molecular mechanics, molecular dynamics, statistical mechanics, cheminformatics, biomolecular structure prediction, molecular design, and bioinformatics.