Richard Tran Mills, Mark Adams, Satish Balay, Jed Brown, Jacob Faibussowitsch, Toby Isaac, Matthew Knepley, Todd Munson, Hansol Suh, Stefano Zampini, Hong Zhang, Junchao Zhang
{"title":"为早期超大规模系统开发 PETSc/TAO","authors":"Richard Tran Mills, Mark Adams, Satish Balay, Jed Brown, Jacob Faibussowitsch, Toby Isaac, Matthew Knepley, Todd Munson, Hansol Suh, Stefano Zampini, Hong Zhang, Junchao Zhang","doi":"arxiv-2406.08646","DOIUrl":null,"url":null,"abstract":"The Portable Extensible Toolkit for Scientific Computation (PETSc) library\nprovides scalable solvers for nonlinear time-dependent differential and\nalgebraic equations and for numerical optimization via the Toolkit for Advanced\nOptimization (TAO). PETSc is used in dozens of scientific fields and is an\nimportant building block for many simulation codes. During the U.S. Department\nof Energy's Exascale Computing Project, the PETSc team has made substantial\nefforts to enable efficient utilization of the massive fine-grain parallelism\npresent within exascale compute nodes and to enable performance portability\nacross exascale architectures. We recap some of the challenges that designers\nof numerical libraries face in such an endeavor, and then discuss the many\ndevelopments we have made, which include the addition of new GPU backends,\nfeatures supporting efficient on-device matrix assembly, better support for\nasynchronicity and GPU kernel concurrency, and new communication\ninfrastructure. We evaluate the performance of these developments on some\npre-exascale systems as well the early exascale systems Frontier and Aurora,\nusing compute kernel, communication layer, solver, and mini-application\nbenchmark studies, and then close with a few observations drawn from our\nexperiences on the tension between portable performance and other goals of\nnumerical libraries.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"168 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"PETSc/TAO Developments for Early Exascale Systems\",\"authors\":\"Richard Tran Mills, Mark Adams, Satish Balay, Jed Brown, Jacob Faibussowitsch, Toby Isaac, Matthew Knepley, Todd Munson, Hansol Suh, Stefano Zampini, Hong Zhang, Junchao Zhang\",\"doi\":\"arxiv-2406.08646\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The Portable Extensible Toolkit for Scientific Computation (PETSc) library\\nprovides scalable solvers for nonlinear time-dependent differential and\\nalgebraic equations and for numerical optimization via the Toolkit for Advanced\\nOptimization (TAO). PETSc is used in dozens of scientific fields and is an\\nimportant building block for many simulation codes. During the U.S. Department\\nof Energy's Exascale Computing Project, the PETSc team has made substantial\\nefforts to enable efficient utilization of the massive fine-grain parallelism\\npresent within exascale compute nodes and to enable performance portability\\nacross exascale architectures. We recap some of the challenges that designers\\nof numerical libraries face in such an endeavor, and then discuss the many\\ndevelopments we have made, which include the addition of new GPU backends,\\nfeatures supporting efficient on-device matrix assembly, better support for\\nasynchronicity and GPU kernel concurrency, and new communication\\ninfrastructure. We evaluate the performance of these developments on some\\npre-exascale systems as well the early exascale systems Frontier and Aurora,\\nusing compute kernel, communication layer, solver, and mini-application\\nbenchmark studies, and then close with a few observations drawn from our\\nexperiences on the tension between portable performance and other goals of\\nnumerical libraries.\",\"PeriodicalId\":501256,\"journal\":{\"name\":\"arXiv - CS - Mathematical Software\",\"volume\":\"168 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-06-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Mathematical Software\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2406.08646\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Mathematical Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.08646","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Portable Extensible Toolkit for Scientific Computation (PETSc) library
provides scalable solvers for nonlinear time-dependent differential and
algebraic equations and for numerical optimization via the Toolkit for Advanced
Optimization (TAO). PETSc is used in dozens of scientific fields and is an
important building block for many simulation codes. During the U.S. Department
of Energy's Exascale Computing Project, the PETSc team has made substantial
efforts to enable efficient utilization of the massive fine-grain parallelism
present within exascale compute nodes and to enable performance portability
across exascale architectures. We recap some of the challenges that designers
of numerical libraries face in such an endeavor, and then discuss the many
developments we have made, which include the addition of new GPU backends,
features supporting efficient on-device matrix assembly, better support for
asynchronicity and GPU kernel concurrency, and new communication
infrastructure. We evaluate the performance of these developments on some
pre-exascale systems as well the early exascale systems Frontier and Aurora,
using compute kernel, communication layer, solver, and mini-application
benchmark studies, and then close with a few observations drawn from our
experiences on the tension between portable performance and other goals of
numerical libraries.