Efficient Algorithms for GPU Accelerated Evaluation of the DFT Exchange-Correlation Functional.

IF 5.5 1区 化学 Q2 CHEMISTRY, PHYSICAL
Ryan Stocks, Giuseppe M J Barca
{"title":"Efficient Algorithms for GPU Accelerated Evaluation of the DFT Exchange-Correlation Functional.","authors":"Ryan Stocks, Giuseppe M J Barca","doi":"10.1021/acs.jctc.5c01229","DOIUrl":null,"url":null,"abstract":"<p><p>Kohn-Sham density functional theory (KS-DFT) has become a cornerstone for studying the electronic structure of molecules and materials. Improving algorithmic efficiency through hardware-aware implementations enables application to larger systems and more efficient generation of larger training data sets for machine-learning. In this work, we present a comparative study of four GPU-accelerated algorithms for evaluating the KS-DFT exchange-correlation (XC) potential with an atom-centered Gaussian basis. Two approaches, both leveraging batched dense linear algebra, are found to outperform the others across a suite of molecular benchmarks. We show that batched formation of the XC matrix from the density matrix yields the best performance for large (<math><mo>></mo><mi>O</mi><mrow><mo>(</mo><msup><mn>10</mn><mn>3</mn></msup><mo>)</mo></mrow></math> basis functions), sparse systems such as glycine chains and water clusters. In contrast, for smaller and denser systems such as diamond nanoparticles, especially if employing large basis sets, algorithms that use the underlying molecular orbital coefficients offer superior performance, despite their higher formal scaling. Our implementations deliver speedups of 1.4-5.2× for XC potential evaluation relative to leading GPU-accelerated KS-DFT codes, significantly lowering the computational cost and enabling the routine use of larger integration grids. Finally, we outline directions for continued performance improvements in light of emerging GPU architectures with emphasis on utilizing mixed-precision capabilities.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":" ","pages":""},"PeriodicalIF":5.5000,"publicationDate":"2025-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Theory and Computation","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jctc.5c01229","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

Abstract

Kohn-Sham density functional theory (KS-DFT) has become a cornerstone for studying the electronic structure of molecules and materials. Improving algorithmic efficiency through hardware-aware implementations enables application to larger systems and more efficient generation of larger training data sets for machine-learning. In this work, we present a comparative study of four GPU-accelerated algorithms for evaluating the KS-DFT exchange-correlation (XC) potential with an atom-centered Gaussian basis. Two approaches, both leveraging batched dense linear algebra, are found to outperform the others across a suite of molecular benchmarks. We show that batched formation of the XC matrix from the density matrix yields the best performance for large (>O(103) basis functions), sparse systems such as glycine chains and water clusters. In contrast, for smaller and denser systems such as diamond nanoparticles, especially if employing large basis sets, algorithms that use the underlying molecular orbital coefficients offer superior performance, despite their higher formal scaling. Our implementations deliver speedups of 1.4-5.2× for XC potential evaluation relative to leading GPU-accelerated KS-DFT codes, significantly lowering the computational cost and enabling the routine use of larger integration grids. Finally, we outline directions for continued performance improvements in light of emerging GPU architectures with emphasis on utilizing mixed-precision capabilities.

Abstract Image

GPU加速DFT交换-相关泛函求值的高效算法。
Kohn-Sham密度泛函理论(KS-DFT)已成为研究分子和材料电子结构的基石。通过硬件感知实现来提高算法效率,可以使应用程序应用于更大的系统,并更有效地为机器学习生成更大的训练数据集。在这项工作中,我们提出了四种gpu加速算法的比较研究,用于评估原子中心高斯基下的KS-DFT交换相关(XC)势。两种方法都利用了批处理密集线性代数,在一系列分子基准测试中表现优于其他方法。我们表明,密度矩阵成批形成的XC矩阵对于大型(>(103)基函数)、稀疏系统(如甘氨酸链和水簇)具有最佳性能。相比之下,对于更小、密度更大的系统,如金刚石纳米颗粒,特别是在使用大基集的情况下,使用潜在分子轨道系数的算法提供了更好的性能,尽管它们的形式缩放更高。相对于领先的gpu加速的KS-DFT代码,我们的实现提供了1.4-5.2倍的XC潜在评估速度,显著降低了计算成本,并使更大的集成网格的常规使用成为可能。最后,我们概述了针对新兴GPU架构的持续性能改进方向,重点是利用混合精度功能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of Chemical Theory and Computation
Journal of Chemical Theory and Computation 化学-物理:原子、分子和化学物理
CiteScore
9.90
自引率
16.40%
发文量
568
审稿时长
1 months
期刊介绍: The Journal of Chemical Theory and Computation invites new and original contributions with the understanding that, if accepted, they will not be published elsewhere. Papers reporting new theories, methodology, and/or important applications in quantum electronic structure, molecular dynamics, and statistical mechanics are appropriate for submission to this Journal. Specific topics include advances in or applications of ab initio quantum mechanics, density functional theory, design and properties of new materials, surface science, Monte Carlo simulations, solvation models, QM/MM calculations, biomolecular structure prediction, and molecular dynamics in the broadest sense including gas-phase dynamics, ab initio dynamics, biomolecular dynamics, and protein folding. The Journal does not consider papers that are straightforward applications of known methods including DFT and molecular dynamics. The Journal favors submissions that include advances in theory or methodology with applications to compelling problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信