在 GPU 集群上构建高性能 Fock 矩阵的先进技术。

IF 5.5 1区 化学 Q2 CHEMISTRY, PHYSICAL
Journal of Chemical Theory and Computation Pub Date : 2024-12-10 Epub Date: 2024-11-25 DOI:10.1021/acs.jctc.4c00994
Elise Palethorpe, Ryan Stocks, Giuseppe M J Barca
{"title":"在 GPU 集群上构建高性能 Fock 矩阵的先进技术。","authors":"Elise Palethorpe, Ryan Stocks, Giuseppe M J Barca","doi":"10.1021/acs.jctc.4c00994","DOIUrl":null,"url":null,"abstract":"<p><p>This Article presents two optimized multi-GPU algorithms for Fock matrix construction, building on the work of Ufimtsev and Martinez [ <i>J. Chem. Theory Comput.</i> 2009, 5, 1004-1015] and Barca et al. [ <i>J. Chem. Theory Comput.</i> 2021, 17, 7486-7503]. The novel algorithms, opt-UM and opt-Brc, introduce significant enhancements, including improved integral screening, exploitation of sparsity and symmetry, a linear scaling exchange matrix assembly algorithm, and extended capabilities for Hartree-Fock caculations up to <i>f</i>-type angular momentum functions. Opt-Brc excels for smaller systems and for highly contracted triple-ζ basis sets, while opt-UM is advantageous for large molecular systems. Performance benchmarks on NVIDIA A100 GPUs show that our algorithms in the EXtreme-scale Electronic Structure System (EXESS), when combined, outperform all current GPU and CPU Fock build implementations in TeraChem, QUICK, GPU4PySCF, LibIntX, ORCA, and Q-Chem. The implementations were benchmarked on linear and globular systems and average speed ups across three double-ζ basis sets of 1.4×, 8.4×, and 9.4× were observed compared to TeraChem, QUICK, and GPU4PySCF respectively. An increased average speedup of 2.1× over TeraChem is observed when using four A100 GPUs. Strong scaling analysis reveals over 91% parallel efficiency on four GPUs for opt-Brc, making it typically faster for multi-GPU execution. Single-compute-node comparisons with CPU-based software like ORCA and Q-Chem show speedups of up to 42× and 31×, respectively, enhancing power efficiency by up to 18×.</p>","PeriodicalId":45,"journal":{"name":"Journal of Chemical Theory and Computation","volume":" ","pages":"10424-10442"},"PeriodicalIF":5.5000,"publicationDate":"2024-12-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters.\",\"authors\":\"Elise Palethorpe, Ryan Stocks, Giuseppe M J Barca\",\"doi\":\"10.1021/acs.jctc.4c00994\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This Article presents two optimized multi-GPU algorithms for Fock matrix construction, building on the work of Ufimtsev and Martinez [ <i>J. Chem. Theory Comput.</i> 2009, 5, 1004-1015] and Barca et al. [ <i>J. Chem. Theory Comput.</i> 2021, 17, 7486-7503]. The novel algorithms, opt-UM and opt-Brc, introduce significant enhancements, including improved integral screening, exploitation of sparsity and symmetry, a linear scaling exchange matrix assembly algorithm, and extended capabilities for Hartree-Fock caculations up to <i>f</i>-type angular momentum functions. Opt-Brc excels for smaller systems and for highly contracted triple-ζ basis sets, while opt-UM is advantageous for large molecular systems. Performance benchmarks on NVIDIA A100 GPUs show that our algorithms in the EXtreme-scale Electronic Structure System (EXESS), when combined, outperform all current GPU and CPU Fock build implementations in TeraChem, QUICK, GPU4PySCF, LibIntX, ORCA, and Q-Chem. The implementations were benchmarked on linear and globular systems and average speed ups across three double-ζ basis sets of 1.4×, 8.4×, and 9.4× were observed compared to TeraChem, QUICK, and GPU4PySCF respectively. An increased average speedup of 2.1× over TeraChem is observed when using four A100 GPUs. Strong scaling analysis reveals over 91% parallel efficiency on four GPUs for opt-Brc, making it typically faster for multi-GPU execution. Single-compute-node comparisons with CPU-based software like ORCA and Q-Chem show speedups of up to 42× and 31×, respectively, enhancing power efficiency by up to 18×.</p>\",\"PeriodicalId\":45,\"journal\":{\"name\":\"Journal of Chemical Theory and Computation\",\"volume\":\" \",\"pages\":\"10424-10442\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2024-12-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Theory and Computation\",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.jctc.4c00994\",\"RegionNum\":1,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2024/11/25 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q2\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Theory and Computation","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jctc.4c00994","RegionNum":1,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2024/11/25 0:00:00","PubModel":"Epub","JCR":"Q2","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

摘要

本文在 Ufimtsev 和 Martinez [ J. Chem. Theory Comput.新算法 opt-UM 和 opt-Brc 带来了重大改进,包括改进的积分筛选、稀疏性和对称性的利用、线性缩放交换矩阵组装算法,以及哈特里-福克计算能力扩展到 f 型角动量函数。Opt-Brc 适用于较小的系统和高度收缩的三ζ基集,而 opt-UM 则适用于大型分子系统。英伟达™(NVIDIA®)A100 GPU 的性能基准测试表明,EXESS(Extreme-scale Electronic Structure System,极大规模电子结构系统)中的算法综合起来优于 TeraChem、QUICK、GPU4PySCF、LibIntX、ORCA 和 Q-Chem 中目前所有的 GPU 和 CPU Fock 构建实现。与 TeraChem、QUICK 和 GPU4PySCF 相比,这些实现方法在线性和球状系统上进行了基准测试,观察到三个双ζ基集的平均速度分别提高了 1.4 倍、8.4 倍和 9.4 倍。当使用四个 A100 GPU 时,平均速度比 TeraChem 提高了 2.1 倍。强扩展分析表明,opt-Brc 在四个 GPU 上的并行效率超过 91%,因此在多 GPU 执行时速度更快。与基于CPU的软件(如ORCA和Q-Chem)进行的单节点比较显示,速度分别提高了42倍和31倍,能效提高了18倍。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Advanced Techniques for High-Performance Fock Matrix Construction on GPU Clusters.

This Article presents two optimized multi-GPU algorithms for Fock matrix construction, building on the work of Ufimtsev and Martinez [ J. Chem. Theory Comput. 2009, 5, 1004-1015] and Barca et al. [ J. Chem. Theory Comput. 2021, 17, 7486-7503]. The novel algorithms, opt-UM and opt-Brc, introduce significant enhancements, including improved integral screening, exploitation of sparsity and symmetry, a linear scaling exchange matrix assembly algorithm, and extended capabilities for Hartree-Fock caculations up to f-type angular momentum functions. Opt-Brc excels for smaller systems and for highly contracted triple-ζ basis sets, while opt-UM is advantageous for large molecular systems. Performance benchmarks on NVIDIA A100 GPUs show that our algorithms in the EXtreme-scale Electronic Structure System (EXESS), when combined, outperform all current GPU and CPU Fock build implementations in TeraChem, QUICK, GPU4PySCF, LibIntX, ORCA, and Q-Chem. The implementations were benchmarked on linear and globular systems and average speed ups across three double-ζ basis sets of 1.4×, 8.4×, and 9.4× were observed compared to TeraChem, QUICK, and GPU4PySCF respectively. An increased average speedup of 2.1× over TeraChem is observed when using four A100 GPUs. Strong scaling analysis reveals over 91% parallel efficiency on four GPUs for opt-Brc, making it typically faster for multi-GPU execution. Single-compute-node comparisons with CPU-based software like ORCA and Q-Chem show speedups of up to 42× and 31×, respectively, enhancing power efficiency by up to 18×.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Journal of Chemical Theory and Computation
Journal of Chemical Theory and Computation 化学-物理:原子、分子和化学物理
CiteScore
9.90
自引率
16.40%
发文量
568
审稿时长
1 months
期刊介绍: The Journal of Chemical Theory and Computation invites new and original contributions with the understanding that, if accepted, they will not be published elsewhere. Papers reporting new theories, methodology, and/or important applications in quantum electronic structure, molecular dynamics, and statistical mechanics are appropriate for submission to this Journal. Specific topics include advances in or applications of ab initio quantum mechanics, density functional theory, design and properties of new materials, surface science, Monte Carlo simulations, solvation models, QM/MM calculations, biomolecular structure prediction, and molecular dynamics in the broadest sense including gas-phase dynamics, ab initio dynamics, biomolecular dynamics, and protein folding. The Journal does not consider papers that are straightforward applications of known methods including DFT and molecular dynamics. The Journal favors submissions that include advances in theory or methodology with applications to compelling problems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信