Critical Benchmarking of the G4(MP2) Model, the Correlation Consistent Composite Approach and Popular Density Functional Approximations on a Probabilistically Pruned Benchmark Dataset of Formation Enthalpies

Sambit Das, S. Chakraborty, R. Ramakrishnan
{"title":"Critical Benchmarking of the G4(MP2) Model, the Correlation Consistent Composite Approach and Popular Density Functional Approximations on a Probabilistically Pruned Benchmark Dataset of Formation Enthalpies","authors":"Sambit Das, S. Chakraborty, R. Ramakrishnan","doi":"10.26434/chemrxiv.12647033.v1","DOIUrl":null,"url":null,"abstract":"First-principles calculation of the standard formation enthalpy, $\\Delta H_f^0$~(298K), in such large scale as required by chemical space explorations, is amenable only with density functional approximations (DFAs) and some composite wave function theories (cWFTs). Alas, the accuracies of popular range-separated hybrid, `rung-4' DFAs, and cWFTs that offer the best accuracy-vs.-cost trade-off have as yet been established only for datasets predominantly comprising small molecules, hence, their transferability to larger datasets remains vague. In this study, we present an extended benchmark dataset of over two-thousand values of $\\Delta H_f^0$ for structurally and electronically diverse molecules. We apply quartile-ranking based on boundary-corrected kernel density estimation to filter outliers and arrive at Probabilistically Pruned Enthalpies of 1908 compounds (PPE1908). For this dataset, we rank the prediction accuracies of G4(MP2), ccCA and 23 popular DFAs using conventional and probabilistic error metrics. We discuss systematic prediction errors and highlight the role an empirical higher-level correction (HLC) plays in the G4(MP2) model. Furthermore, we comment on uncertainties associated with the reference empirical data for atoms and systematic errors introduced by these that grow with the molecular size. We believe these findings to aid in identifying meaningful application domains for quantum thermochemical methods.","PeriodicalId":8439,"journal":{"name":"arXiv: Chemical Physics","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2020-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv: Chemical Physics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.26434/chemrxiv.12647033.v1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

First-principles calculation of the standard formation enthalpy, $\Delta H_f^0$~(298K), in such large scale as required by chemical space explorations, is amenable only with density functional approximations (DFAs) and some composite wave function theories (cWFTs). Alas, the accuracies of popular range-separated hybrid, `rung-4' DFAs, and cWFTs that offer the best accuracy-vs.-cost trade-off have as yet been established only for datasets predominantly comprising small molecules, hence, their transferability to larger datasets remains vague. In this study, we present an extended benchmark dataset of over two-thousand values of $\Delta H_f^0$ for structurally and electronically diverse molecules. We apply quartile-ranking based on boundary-corrected kernel density estimation to filter outliers and arrive at Probabilistically Pruned Enthalpies of 1908 compounds (PPE1908). For this dataset, we rank the prediction accuracies of G4(MP2), ccCA and 23 popular DFAs using conventional and probabilistic error metrics. We discuss systematic prediction errors and highlight the role an empirical higher-level correction (HLC) plays in the G4(MP2) model. Furthermore, we comment on uncertainties associated with the reference empirical data for atoms and systematic errors introduced by these that grow with the molecular size. We believe these findings to aid in identifying meaningful application domains for quantum thermochemical methods.
G4(MP2)模型的临界基准测试、相关一致复合方法和基于概率剪枝生成焓基准数据集的流行密度泛函逼近
标准生成焓$\Delta H_f^0$~(298K)的第一性原理计算,在化学空间勘探所要求的如此大的尺度下,只能用密度泛函近似(dfa)和一些复合波函数理论(cWFTs)来进行。唉,流行的距离分离混合,' rung-4' dfa和cwft的精度提供了最好的精度vs。-成本权衡目前只建立在主要由小分子组成的数据集上,因此,它们对更大数据集的可转移性仍然模糊。在这项研究中,我们提出了一个扩展的基准数据集,其中包含超过2000个结构和电子多样性分子的$\Delta H_f^0$值。我们应用基于边界校正核密度估计的四分位排序来过滤异常值,并得到1908化合物的概率修剪焓(PPE1908)。对于这个数据集,我们使用常规和概率误差指标对G4(MP2)、ccCA和23个流行的dfa的预测精度进行了排名。我们讨论了系统预测误差,并强调了经验高水平校正(HLC)在G4(MP2)模型中的作用。此外,我们还评论了与原子参考经验数据相关的不确定性以及这些数据随着分子大小的增长而引入的系统误差。我们相信这些发现有助于确定量子热化学方法的有意义的应用领域。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信