Interpretable, extensible linear and symbolic regression models for charge density prediction using a hierarchy of many-body correlation descriptors

IF 3.1 3区 材料科学 Q2 MATERIALS SCIENCE, MULTIDISCIPLINARY
Gopal R. Iyer , Shashikant Kumar , Edgar Josué Landinez Borda , Babak Sadigh , Sebastien Hamel , Vasily Bulatov , Vincenzo Lordi , Amit Samanta
{"title":"Interpretable, extensible linear and symbolic regression models for charge density prediction using a hierarchy of many-body correlation descriptors","authors":"Gopal R. Iyer ,&nbsp;Shashikant Kumar ,&nbsp;Edgar Josué Landinez Borda ,&nbsp;Babak Sadigh ,&nbsp;Sebastien Hamel ,&nbsp;Vasily Bulatov ,&nbsp;Vincenzo Lordi ,&nbsp;Amit Samanta","doi":"10.1016/j.commatsci.2024.113433","DOIUrl":null,"url":null,"abstract":"<div><div>Density functional theory (DFT) is routinely used to make electronic structure predictions for high-throughput screening of materials and molecules for technologically relevant areas, like the identification of better catalysts, electronic materials, and drug discovery. However, the DFT formalism is limited by (a) its poor (quadratic-to-quartic) scaling, and (b) the need to perform repeated eigenvalue computations of the electronic Hamiltonian as part of its self-consistent field (SCF) iteration procedure to obtain the converged ground state electron density, <span><math><mrow><mi>ρ</mi><mfenced><mrow><mi>r</mi></mrow></mfenced></mrow></math></span>. Approaches that directly predict <span><math><mrow><mi>ρ</mi><mfenced><mrow><mi>r</mi></mrow></mfenced></mrow></math></span> of a structure with high accuracy can accelerate conventional SCF calculations and can also be used in linearly scaling methods such as orbital-free DFT. To this end, we present a procedure to predict the ground state electron density of molecular and periodic three-dimensional systems directly from the atomic structure with a particular emphasis on physical interpretability. In our framework, <span><math><mrow><mi>ρ</mi><mfenced><mrow><mi>r</mi></mrow></mfenced></mrow></math></span> is modeled using many-body correlation descriptors that accurately capture the effects of local atomic arrangements in the neighborhood of a grid point. Our use of a linear regression scheme to fit to charge density data enables transparent analysis of the relative contributions of various types of local atomic correlations. By systematically including increasingly complex correlations, our model is shown to accurately predict <span><math><mrow><mi>ρ</mi><mfenced><mrow><mi>r</mi></mrow></mfenced></mrow></math></span> for a variety of chemically and electronically diverse systems — amorphous Ge, Al(001) slab, crystalline <span><math><mrow><msub><mrow><mi>Ga</mi></mrow><mrow><mn>2</mn></mrow></msub><msub><mrow><mi>O</mi></mrow><mrow><mn>3</mn></mrow></msub></mrow></math></span>, molecular benzene, and polyethylene. We then demonstrate a symbolic regression-based protocol to construct easily computable, interpretable features from lower-order correlations that significantly improves our electron density predictions with effectively no increase in the computational cost.</div></div>","PeriodicalId":10650,"journal":{"name":"Computational Materials Science","volume":"246 ","pages":"Article 113433"},"PeriodicalIF":3.1000,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational Materials Science","FirstCategoryId":"88","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0927025624006542","RegionNum":3,"RegionCategory":"材料科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, MULTIDISCIPLINARY","Score":null,"Total":0}
引用次数: 0

Abstract

Density functional theory (DFT) is routinely used to make electronic structure predictions for high-throughput screening of materials and molecules for technologically relevant areas, like the identification of better catalysts, electronic materials, and drug discovery. However, the DFT formalism is limited by (a) its poor (quadratic-to-quartic) scaling, and (b) the need to perform repeated eigenvalue computations of the electronic Hamiltonian as part of its self-consistent field (SCF) iteration procedure to obtain the converged ground state electron density, ρr. Approaches that directly predict ρr of a structure with high accuracy can accelerate conventional SCF calculations and can also be used in linearly scaling methods such as orbital-free DFT. To this end, we present a procedure to predict the ground state electron density of molecular and periodic three-dimensional systems directly from the atomic structure with a particular emphasis on physical interpretability. In our framework, ρr is modeled using many-body correlation descriptors that accurately capture the effects of local atomic arrangements in the neighborhood of a grid point. Our use of a linear regression scheme to fit to charge density data enables transparent analysis of the relative contributions of various types of local atomic correlations. By systematically including increasingly complex correlations, our model is shown to accurately predict ρr for a variety of chemically and electronically diverse systems — amorphous Ge, Al(001) slab, crystalline Ga2O3, molecular benzene, and polyethylene. We then demonstrate a symbolic regression-based protocol to construct easily computable, interpretable features from lower-order correlations that significantly improves our electron density predictions with effectively no increase in the computational cost.
利用多体相关描述符层次结构预测电荷密度的可解释、可扩展线性和符号回归模型
密度泛函理论(DFT)通常用于对材料和分子的电子结构进行预测,以进行高通量筛选,应用于技术相关领域,如确定更好的催化剂、电子材料和药物发现。然而,DFT 形式主义受到以下限制:(a) 扩展性差(二次方到四次方);(b) 作为自洽场(SCF)迭代程序的一部分,需要对电子哈密顿反复进行特征值计算,以获得收敛基态电子密度 ρr。直接高精度预测结构的 ρr 的方法可以加速传统的 SCF 计算,也可用于无轨道 DFT 等线性缩放方法。为此,我们提出了一种直接从原子结构预测分子和周期三维系统基态电子密度的方法,并特别强调了物理可解释性。在我们的框架中,ρr 使用多体相关描述符建模,它能准确捕捉网格点附近局部原子排列的影响。我们使用线性回归方案来拟合电荷密度数据,从而能够透明地分析各种局部原子相关性的相对贡献。通过系统地纳入日益复杂的相关性,我们的模型可以准确预测各种化学和电子不同系统的 ρr - 无定形 Ge、Al(001) 板、结晶 Ga2O3、分子苯和聚乙烯。然后,我们展示了一种基于符号回归的协议,可以从低阶相关性中构建易于计算和解释的特征,从而在不增加计算成本的情况下显著改善我们的电子密度预测。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computational Materials Science
Computational Materials Science 工程技术-材料科学:综合
CiteScore
6.50
自引率
6.10%
发文量
665
审稿时长
26 days
期刊介绍: The goal of Computational Materials Science is to report on results that provide new or unique insights into, or significantly expand our understanding of, the properties of materials or phenomena associated with their design, synthesis, processing, characterization, and utilization. To be relevant to the journal, the results should be applied or applicable to specific material systems that are discussed within the submission.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信