PETSc/TAO Developments for Early Exascale Systems

Richard Tran Mills, Mark Adams, Satish Balay, Jed Brown, Jacob Faibussowitsch, Toby Isaac, Matthew Knepley, Todd Munson, Hansol Suh, Stefano Zampini, Hong Zhang, Junchao Zhang
{"title":"PETSc/TAO Developments for Early Exascale Systems","authors":"Richard Tran Mills, Mark Adams, Satish Balay, Jed Brown, Jacob Faibussowitsch, Toby Isaac, Matthew Knepley, Todd Munson, Hansol Suh, Stefano Zampini, Hong Zhang, Junchao Zhang","doi":"arxiv-2406.08646","DOIUrl":null,"url":null,"abstract":"The Portable Extensible Toolkit for Scientific Computation (PETSc) library\nprovides scalable solvers for nonlinear time-dependent differential and\nalgebraic equations and for numerical optimization via the Toolkit for Advanced\nOptimization (TAO). PETSc is used in dozens of scientific fields and is an\nimportant building block for many simulation codes. During the U.S. Department\nof Energy's Exascale Computing Project, the PETSc team has made substantial\nefforts to enable efficient utilization of the massive fine-grain parallelism\npresent within exascale compute nodes and to enable performance portability\nacross exascale architectures. We recap some of the challenges that designers\nof numerical libraries face in such an endeavor, and then discuss the many\ndevelopments we have made, which include the addition of new GPU backends,\nfeatures supporting efficient on-device matrix assembly, better support for\nasynchronicity and GPU kernel concurrency, and new communication\ninfrastructure. We evaluate the performance of these developments on some\npre-exascale systems as well the early exascale systems Frontier and Aurora,\nusing compute kernel, communication layer, solver, and mini-application\nbenchmark studies, and then close with a few observations drawn from our\nexperiences on the tension between portable performance and other goals of\nnumerical libraries.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"168 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Mathematical Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.08646","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

The Portable Extensible Toolkit for Scientific Computation (PETSc) library provides scalable solvers for nonlinear time-dependent differential and algebraic equations and for numerical optimization via the Toolkit for Advanced Optimization (TAO). PETSc is used in dozens of scientific fields and is an important building block for many simulation codes. During the U.S. Department of Energy's Exascale Computing Project, the PETSc team has made substantial efforts to enable efficient utilization of the massive fine-grain parallelism present within exascale compute nodes and to enable performance portability across exascale architectures. We recap some of the challenges that designers of numerical libraries face in such an endeavor, and then discuss the many developments we have made, which include the addition of new GPU backends, features supporting efficient on-device matrix assembly, better support for asynchronicity and GPU kernel concurrency, and new communication infrastructure. We evaluate the performance of these developments on some pre-exascale systems as well the early exascale systems Frontier and Aurora, using compute kernel, communication layer, solver, and mini-application benchmark studies, and then close with a few observations drawn from our experiences on the tension between portable performance and other goals of numerical libraries.
为早期超大规模系统开发 PETSc/TAO
便携式可扩展科学计算工具包(PETSc)库为非线性时变微分方程和代数方程提供了可扩展的求解器,并通过高级优化工具包(TAO)为数值优化提供了可扩展的求解器。PETSc 广泛应用于数十个科学领域,是许多仿真代码的重要构建模块。在美国能源部的超大规模计算项目中,PETSc 团队做出了巨大努力,以高效利用超大规模计算节点中的大规模细粒度并行计算,并实现跨超大规模架构的性能可移植性。我们回顾了数值库设计者在这项工作中面临的一些挑战,然后讨论了我们所做的许多开发工作,其中包括增加新的 GPU 后端、支持高效设备上矩阵组装的功能、更好地支持同步性和 GPU 内核并发性,以及新的通信基础设施。我们利用计算内核、通信层、求解器和小型应用基准研究,评估了这些开发成果在一些超大规模前系统以及早期超大规模系统 Frontier 和 Aurora 上的性能,最后就可移植性能与数值库其他目标之间的矛盾提出了我们的一些看法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信