{"title":"Neural Mulliken Analysis: Molecular Graphs from Density Matrices for QSPR on Raw Quantum-Chemical Data.","authors":"Oleg I Gromov","doi":"10.1021/acs.jctc.5c00425","DOIUrl":null,"url":null,"abstract":"<p><p>Here, molecular graphs derived from the one-electron density matrix are introduced within a more general effort to explore whether incorporating electronic structure awareness allows a single model to both better generalize from small data and better learn molecular encodings. Diagonal density matrix blocks serve as atomic node embeddings, while off-diagonal blocks provide embeddings for <i>\"link\"</i> nodes related to atomic pairs. In a minimal basis, these embeddings have dimensions of only 45 and 81, yet no information is lost and the original density matrix can be fully reconstructed. Blocks from the basis set overlap matrix are used as edge embeddings to encode structural information and as weights for message aggregation operations. Additionally, element-wise multiplication performed during aggregation may provide access to electronic charges, analogous to Mulliken population analysis. The proposed concept was evaluated using data from the First and Second Solubility Challenges (Llinàs et al. <i>J.Chem. Inf. Model.</i> <b>2008</b>, <i>48</i>, 1289-1303; Llinàs and Avdeef <i>J. Chem. Inf. Model.</i> <b>2019</b>, <i>59</i>, 3036-3040). A graph neural network (GNN) trained on sets of 94 and 1000 drug-like molecules achieved improved solubility prediction accuracy (RMSE 0.63, <i>R</i><sup>2</sup> 0.79 in SC-1 and RMSE of 0.83 and 0.92, <i>R</i><sup>2</sup> of 0.57 and 0.79 on the \"tight\" and \"loose\" SC-2 test sets, respectively). If combined with existing techniques for predicting electron density from molecular structures, this approach is promising for addressing a range of chemical machine-learning problems.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":" ","pages":"6380-6393"},"PeriodicalIF":5.7000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Theory and Computation","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jctc.5c00425","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/26 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0
Abstract
Here, molecular graphs derived from the one-electron density matrix are introduced within a more general effort to explore whether incorporating electronic structure awareness allows a single model to both better generalize from small data and better learn molecular encodings. Diagonal density matrix blocks serve as atomic node embeddings, while off-diagonal blocks provide embeddings for "link" nodes related to atomic pairs. In a minimal basis, these embeddings have dimensions of only 45 and 81, yet no information is lost and the original density matrix can be fully reconstructed. Blocks from the basis set overlap matrix are used as edge embeddings to encode structural information and as weights for message aggregation operations. Additionally, element-wise multiplication performed during aggregation may provide access to electronic charges, analogous to Mulliken population analysis. The proposed concept was evaluated using data from the First and Second Solubility Challenges (Llinàs et al. J.Chem. Inf. Model.2008, 48, 1289-1303; Llinàs and Avdeef J. Chem. Inf. Model.2019, 59, 3036-3040). A graph neural network (GNN) trained on sets of 94 and 1000 drug-like molecules achieved improved solubility prediction accuracy (RMSE 0.63, R2 0.79 in SC-1 and RMSE of 0.83 and 0.92, R2 of 0.57 and 0.79 on the "tight" and "loose" SC-2 test sets, respectively). If combined with existing techniques for predicting electron density from molecular structures, this approach is promising for addressing a range of chemical machine-learning problems.
期刊介绍:
The Journal of Chemical Theory and Computation invites new and original contributions with the understanding that, if accepted, they will not be published elsewhere. Papers reporting new theories, methodology, and/or important applications in quantum electronic structure, molecular dynamics, and statistical mechanics are appropriate for submission to this Journal. Specific topics include advances in or applications of ab initio quantum mechanics, density functional theory, design and properties of new materials, surface science, Monte Carlo simulations, solvation models, QM/MM calculations, biomolecular structure prediction, and molecular dynamics in the broadest sense including gas-phase dynamics, ab initio dynamics, biomolecular dynamics, and protein folding. The Journal does not consider papers that are straightforward applications of known methods including DFT and molecular dynamics. The Journal favors submissions that include advances in theory or methodology with applications to compelling problems.