Efficient interpolation of molecular properties across chemical compound space with low-dimensional descriptors

Yun-Wen Mao, R. Krems
{"title":"Efficient interpolation of molecular properties across chemical compound space with low-dimensional descriptors","authors":"Yun-Wen Mao, R. Krems","doi":"10.1088/2632-2153/ad360e","DOIUrl":null,"url":null,"abstract":"\n We demonstrate accurate data-starved models of molecular properties for interpolation in chemical compound spaces with low-dimensional descriptors. Our starting point is based on three-dimensional, universal, physical descriptors derived from the properties of the distributions of the eigenvalues of Coulomb matrices. To account for the shape and composition of molecules, we combine these descriptors with six-dimensional features informed by the Gershgorin circle theorem. We use the nine-dimensional descriptors thus obtained for Gaussian process regression based on kernels with variable functional form, leading to extremely efficient, low-dimensional interpolation models. The resulting models trained with 100 molecules are able to predict the product of entropy and temperature (S × T ) and zero point vibrational energy (ZPVE) with the absolute error under 1 kcal/mol for > 78 % and under 1.3 kcal/mol for > 92 % of molecules in the test data. The test data comprises 20,000 molecules with complexity varying from three atoms to 29 atoms and the ranges of S × T and ZPVE covering 36 kcal/mol and 161 kcal/mol, respectively. We also illustrate that the descriptors based on the Gershgorin circle theorem yield more accurate models of molecular entropy than those based on graph neural networks that explicitly account for the atomic connectivity of molecules.","PeriodicalId":503691,"journal":{"name":"Machine Learning: Science and Technology","volume":"5 11","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Machine Learning: Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1088/2632-2153/ad360e","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We demonstrate accurate data-starved models of molecular properties for interpolation in chemical compound spaces with low-dimensional descriptors. Our starting point is based on three-dimensional, universal, physical descriptors derived from the properties of the distributions of the eigenvalues of Coulomb matrices. To account for the shape and composition of molecules, we combine these descriptors with six-dimensional features informed by the Gershgorin circle theorem. We use the nine-dimensional descriptors thus obtained for Gaussian process regression based on kernels with variable functional form, leading to extremely efficient, low-dimensional interpolation models. The resulting models trained with 100 molecules are able to predict the product of entropy and temperature (S × T ) and zero point vibrational energy (ZPVE) with the absolute error under 1 kcal/mol for > 78 % and under 1.3 kcal/mol for > 92 % of molecules in the test data. The test data comprises 20,000 molecules with complexity varying from three atoms to 29 atoms and the ranges of S × T and ZPVE covering 36 kcal/mol and 161 kcal/mol, respectively. We also illustrate that the descriptors based on the Gershgorin circle theorem yield more accurate models of molecular entropy than those based on graph neural networks that explicitly account for the atomic connectivity of molecules.
利用低维描述符在化合物空间对分子特性进行高效插值
我们利用低维描述符展示了分子特性的精确数据匮乏模型,以便在化合物空间中进行插值。我们的出发点是基于库仑矩阵特征值分布特性得出的三维通用物理描述符。为了解释分子的形状和组成,我们将这些描述符与格什高林圆定理所提供的六维特征相结合。我们将由此获得的九维描述符用于基于具有可变函数形式的核的高斯过程回归,从而得到极其简便的低维插值模型。用 100 个分子训练得出的模型能够预测熵与温度的乘积(S × T )和零点振动能(ZPVE),绝对误差大于 78% 的分子低于 1 kcal/mol,大于 92% 的分子低于 1.3 kcal/mol。测试数据包括 20,000 个分子,复杂度从 3 个原子到 29 个原子不等,S × T 和 ZPVE 的范围分别为 36 kcal/mol 和 161 kcal/mol。我们还说明,基于格什高林圆周定理的描述符比基于图神经网络的描述符得到的分子熵模型更精确,后者明确考虑了分子的原子连接性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信