为早期超大规模系统开发 PETSc/TAO

arXiv - CS - Mathematical Software Pub Date : 2024-06-12 DOI:arxiv-2406.08646

Richard Tran Mills, Mark Adams, Satish Balay, Jed Brown, Jacob Faibussowitsch, Toby Isaac, Matthew Knepley, Todd Munson, Hansol Suh, Stefano Zampini, Hong Zhang, Junchao Zhang

{"title":"为早期超大规模系统开发 PETSc/TAO","authors":"Richard Tran Mills, Mark Adams, Satish Balay, Jed Brown, Jacob Faibussowitsch, Toby Isaac, Matthew Knepley, Todd Munson, Hansol Suh, Stefano Zampini, Hong Zhang, Junchao Zhang","doi":"arxiv-2406.08646","DOIUrl":null,"url":null,"abstract":"The Portable Extensible Toolkit for Scientific Computation (PETSc) library\nprovides scalable solvers for nonlinear time-dependent differential and\nalgebraic equations and for numerical optimization via the Toolkit for Advanced\nOptimization (TAO). PETSc is used in dozens of scientific fields and is an\nimportant building block for many simulation codes. During the U.S. Department\nof Energy's Exascale Computing Project, the PETSc team has made substantial\nefforts to enable efficient utilization of the massive fine-grain parallelism\npresent within exascale compute nodes and to enable performance portability\nacross exascale architectures. We recap some of the challenges that designers\nof numerical libraries face in such an endeavor, and then discuss the many\ndevelopments we have made, which include the addition of new GPU backends,\nfeatures supporting efficient on-device matrix assembly, better support for\nasynchronicity and GPU kernel concurrency, and new communication\ninfrastructure. We evaluate the performance of these developments on some\npre-exascale systems as well the early exascale systems Frontier and Aurora,\nusing compute kernel, communication layer, solver, and mini-application\nbenchmark studies, and then close with a few observations drawn from our\nexperiences on the tension between portable performance and other goals of\nnumerical libraries.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"168 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PETSc/TAO Developments for Early Exascale Systems\",\"authors\":\"Richard Tran Mills, Mark Adams, Satish Balay, Jed Brown, Jacob Faibussowitsch, Toby Isaac, Matthew Knepley, Todd Munson, Hansol Suh, Stefano Zampini, Hong Zhang, Junchao Zhang\",\"doi\":\"arxiv-2406.08646\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Portable Extensible Toolkit for Scientific Computation (PETSc) library\\nprovides scalable solvers for nonlinear time-dependent differential and\\nalgebraic equations and for numerical optimization via the Toolkit for Advanced\\nOptimization (TAO). PETSc is used in dozens of scientific fields and is an\\nimportant building block for many simulation codes. During the U.S. Department\\nof Energy's Exascale Computing Project, the PETSc team has made substantial\\nefforts to enable efficient utilization of the massive fine-grain parallelism\\npresent within exascale compute nodes and to enable performance portability\\nacross exascale architectures. We recap some of the challenges that designers\\nof numerical libraries face in such an endeavor, and then discuss the many\\ndevelopments we have made, which include the addition of new GPU backends,\\nfeatures supporting efficient on-device matrix assembly, better support for\\nasynchronicity and GPU kernel concurrency, and new communication\\ninfrastructure. We evaluate the performance of these developments on some\\npre-exascale systems as well the early exascale systems Frontier and Aurora,\\nusing compute kernel, communication layer, solver, and mini-application\\nbenchmark studies, and then close with a few observations drawn from our\\nexperiences on the tension between portable performance and other goals of\\nnumerical libraries.\",\"PeriodicalId\":501256,\"journal\":{\"name\":\"arXiv - CS - Mathematical Software\",\"volume\":\"168 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Mathematical Software\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2406.08646\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Mathematical Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.08646","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

便携式可扩展科学计算工具包（PETSc）库为非线性时变微分方程和代数方程提供了可扩展的求解器，并通过高级优化工具包（TAO）为数值优化提供了可扩展的求解器。PETSc 广泛应用于数十个科学领域，是许多仿真代码的重要构建模块。在美国能源部的超大规模计算项目中，PETSc 团队做出了巨大努力，以高效利用超大规模计算节点中的大规模细粒度并行计算，并实现跨超大规模架构的性能可移植性。我们回顾了数值库设计者在这项工作中面临的一些挑战，然后讨论了我们所做的许多开发工作，其中包括增加新的 GPU 后端、支持高效设备上矩阵组装的功能、更好地支持同步性和 GPU 内核并发性，以及新的通信基础设施。我们利用计算内核、通信层、求解器和小型应用基准研究，评估了这些开发成果在一些超大规模前系统以及早期超大规模系统 Frontier 和 Aurora 上的性能，最后就可移植性能与数值库其他目标之间的矛盾提出了我们的一些看法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

PETSc/TAO Developments for Early Exascale Systems

The Portable Extensible Toolkit for Scientific Computation (PETSc) library provides scalable solvers for nonlinear time-dependent differential and algebraic equations and for numerical optimization via the Toolkit for Advanced Optimization (TAO). PETSc is used in dozens of scientific fields and is an important building block for many simulation codes. During the U.S. Department of Energy's Exascale Computing Project, the PETSc team has made substantial efforts to enable efficient utilization of the massive fine-grain parallelism present within exascale compute nodes and to enable performance portability across exascale architectures. We recap some of the challenges that designers of numerical libraries face in such an endeavor, and then discuss the many developments we have made, which include the addition of new GPU backends, features supporting efficient on-device matrix assembly, better support for asynchronicity and GPU kernel concurrency, and new communication infrastructure. We evaluate the performance of these developments on some pre-exascale systems as well the early exascale systems Frontier and Aurora, using compute kernel, communication layer, solver, and mini-application benchmark studies, and then close with a few observations drawn from our experiences on the tension between portable performance and other goals of numerical libraries.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Mathematical Software

自引率

0.00%

发文量