Neural Mulliken Analysis: Molecular Graphs from Density Matrices for QSPR on Raw Quantum-Chemical Data.

IF 5.7 1区 化学 Q2 CHEMISTRY, PHYSICAL
Journal of Chemical Theory and Computation Pub Date : 2025-07-08 Epub Date: 2025-06-26 DOI:10.1021/acs.jctc.5c00425
Oleg I Gromov
{"title":"Neural Mulliken Analysis: Molecular Graphs from Density Matrices for QSPR on Raw Quantum-Chemical Data.","authors":"Oleg I Gromov","doi":"10.1021/acs.jctc.5c00425","DOIUrl":null,"url":null,"abstract":"<p><p>Here, molecular graphs derived from the one-electron density matrix are introduced within a more general effort to explore whether incorporating electronic structure awareness allows a single model to both better generalize from small data and better learn molecular encodings. Diagonal density matrix blocks serve as atomic node embeddings, while off-diagonal blocks provide embeddings for <i>\"link\"</i> nodes related to atomic pairs. In a minimal basis, these embeddings have dimensions of only 45 and 81, yet no information is lost and the original density matrix can be fully reconstructed. Blocks from the basis set overlap matrix are used as edge embeddings to encode structural information and as weights for message aggregation operations. Additionally, element-wise multiplication performed during aggregation may provide access to electronic charges, analogous to Mulliken population analysis. The proposed concept was evaluated using data from the First and Second Solubility Challenges (Llinàs et al. <i>J.Chem. Inf. Model.</i> <b>2008</b>, <i>48</i>, 1289-1303; Llinàs and Avdeef <i>J. Chem. Inf. Model.</i> <b>2019</b>, <i>59</i>, 3036-3040). A graph neural network (GNN) trained on sets of 94 and 1000 drug-like molecules achieved improved solubility prediction accuracy (RMSE 0.63, <i>R</i><sup>2</sup> 0.79 in SC-1 and RMSE of 0.83 and 0.92, <i>R</i><sup>2</sup> of 0.57 and 0.79 on the \"tight\" and \"loose\" SC-2 test sets, respectively). If combined with existing techniques for predicting electron density from molecular structures, this approach is promising for addressing a range of chemical machine-learning problems.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":" ","pages":"6380-6393"},"PeriodicalIF":5.7000,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Theory and Computation","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jctc.5c00425","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/6/26 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Here, molecular graphs derived from the one-electron density matrix are introduced within a more general effort to explore whether incorporating electronic structure awareness allows a single model to both better generalize from small data and better learn molecular encodings. Diagonal density matrix blocks serve as atomic node embeddings, while off-diagonal blocks provide embeddings for "link" nodes related to atomic pairs. In a minimal basis, these embeddings have dimensions of only 45 and 81, yet no information is lost and the original density matrix can be fully reconstructed. Blocks from the basis set overlap matrix are used as edge embeddings to encode structural information and as weights for message aggregation operations. Additionally, element-wise multiplication performed during aggregation may provide access to electronic charges, analogous to Mulliken population analysis. The proposed concept was evaluated using data from the First and Second Solubility Challenges (Llinàs et al. J.Chem. Inf. Model. 2008, 48, 1289-1303; Llinàs and Avdeef J. Chem. Inf. Model. 2019, 59, 3036-3040). A graph neural network (GNN) trained on sets of 94 and 1000 drug-like molecules achieved improved solubility prediction accuracy (RMSE 0.63, R2 0.79 in SC-1 and RMSE of 0.83 and 0.92, R2 of 0.57 and 0.79 on the "tight" and "loose" SC-2 test sets, respectively). If combined with existing techniques for predicting electron density from molecular structures, this approach is promising for addressing a range of chemical machine-learning problems.

神经Mulliken分析:基于原始量子化学数据的QSPR密度矩阵的分子图。
在这里,从单电子密度矩阵中导出的分子图被引入到一个更普遍的努力中,以探索结合电子结构感知是否允许单个模型更好地从小数据中泛化并更好地学习分子编码。对角线密度矩阵块作为原子节点嵌入,而非对角线块为与原子对相关的“链接”节点提供嵌入。在最小的基础上,这些嵌入的尺寸只有45和81,但没有信息丢失,原始密度矩阵可以完全重建。基集重叠矩阵中的块用作边缘嵌入来编码结构信息,并用作消息聚合操作的权重。此外,在聚合期间执行的元素智能乘法可以提供对电子电荷的访问,类似于Mulliken种群分析。使用第一和第二溶解度挑战(Llinàs等)的数据对提出的概念进行了评估。J.Chem。Inf. Model. 2008, 48, 1289-1303;Llinàs和Avdeef J. Chem。Inf. Model. 2019, 59, 3036-3040)。在94个和1000个药物样分子集上训练的图神经网络(GNN)获得了更高的溶解度预测精度(SC-1中RMSE为0.63,R2为0.79;在“紧”和“松”SC-2测试集上RMSE分别为0.83和0.92,R2为0.57和0.79)。如果结合现有的从分子结构预测电子密度的技术,这种方法有望解决一系列化学机器学习问题。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Chemical Theory and Computation
Journal of Chemical Theory and Computation 化学-物理:原子、分子和化学物理
CiteScore
9.90
自引率
16.40%
发文量
568
审稿时长
1 months
期刊介绍: The Journal of Chemical Theory and Computation invites new and original contributions with the understanding that, if accepted, they will not be published elsewhere. Papers reporting new theories, methodology, and/or important applications in quantum electronic structure, molecular dynamics, and statistical mechanics are appropriate for submission to this Journal. Specific topics include advances in or applications of ab initio quantum mechanics, density functional theory, design and properties of new materials, surface science, Monte Carlo simulations, solvation models, QM/MM calculations, biomolecular structure prediction, and molecular dynamics in the broadest sense including gas-phase dynamics, ab initio dynamics, biomolecular dynamics, and protein folding. The Journal does not consider papers that are straightforward applications of known methods including DFT and molecular dynamics. The Journal favors submissions that include advances in theory or methodology with applications to compelling problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信