David A. Ham, Vaclav Hapla, Matthew G. Knepley, Lawrence Mitchell, Koki Sagiyama
{"title":"有限元模拟的高效 N 对 M 检查点算法","authors":"David A. Ham, Vaclav Hapla, Matthew G. Knepley, Lawrence Mitchell, Koki Sagiyama","doi":"arxiv-2401.05868","DOIUrl":null,"url":null,"abstract":"In this work, we introduce a new algorithm for N-to-M checkpointing in finite\nelement simulations. This new algorithm allows efficient saving/loading of\nfunctions representing physical quantities associated with the mesh\nrepresenting the physical domain. Specifically, the algorithm allows for using\ndifferent numbers of parallel processes for saving and loading, allowing for\nrestarting and post-processing on the process count appropriate to the given\nphase of the simulation and other conditions. For demonstration, we implemented\nthis algorithm in PETSc, the Portable, Extensible Toolkit for Scientific\nComputation, and added a convenient high-level interface into Firedrake, a\nsystem for solving partial differential equations using finite element methods.\nWe evaluated our new implementation by saving and loading data involving 8.2\nbillion finite element degrees of freedom using 8,192 parallel processes on\nARCHER2, the UK National Supercomputing Service.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"4 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Efficient N-to-M Checkpointing Algorithm for Finite Element Simulations\",\"authors\":\"David A. Ham, Vaclav Hapla, Matthew G. Knepley, Lawrence Mitchell, Koki Sagiyama\",\"doi\":\"arxiv-2401.05868\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this work, we introduce a new algorithm for N-to-M checkpointing in finite\\nelement simulations. This new algorithm allows efficient saving/loading of\\nfunctions representing physical quantities associated with the mesh\\nrepresenting the physical domain. Specifically, the algorithm allows for using\\ndifferent numbers of parallel processes for saving and loading, allowing for\\nrestarting and post-processing on the process count appropriate to the given\\nphase of the simulation and other conditions. For demonstration, we implemented\\nthis algorithm in PETSc, the Portable, Extensible Toolkit for Scientific\\nComputation, and added a convenient high-level interface into Firedrake, a\\nsystem for solving partial differential equations using finite element methods.\\nWe evaluated our new implementation by saving and loading data involving 8.2\\nbillion finite element degrees of freedom using 8,192 parallel processes on\\nARCHER2, the UK National Supercomputing Service.\",\"PeriodicalId\":501256,\"journal\":{\"name\":\"arXiv - CS - Mathematical Software\",\"volume\":\"4 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-01-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Mathematical Software\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2401.05868\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Mathematical Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2401.05868","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
在这项工作中,我们为有限元模拟中的 N 对 M 检查点引入了一种新算法。这种新算法可以高效地保存/加载与物理域网格相关的物理量函数。具体来说,该算法允许使用不同数量的并行进程进行保存和加载,允许在与给定模拟阶段和其他条件相适应的进程数量上启动和后处理。为了进行演示,我们在 PETSc(用于科学计算的便携式可扩展工具包)中实现了这一算法,并在 Firedrake(使用有限元方法求解偏微分方程的系统)中添加了一个方便的高级接口。我们在英国国家超级计算服务机构ARCHER2 上使用 8192 个并行进程保存和加载了涉及 82 亿个有限元自由度的数据,对我们的新实现进行了评估。
Efficient N-to-M Checkpointing Algorithm for Finite Element Simulations
In this work, we introduce a new algorithm for N-to-M checkpointing in finite
element simulations. This new algorithm allows efficient saving/loading of
functions representing physical quantities associated with the mesh
representing the physical domain. Specifically, the algorithm allows for using
different numbers of parallel processes for saving and loading, allowing for
restarting and post-processing on the process count appropriate to the given
phase of the simulation and other conditions. For demonstration, we implemented
this algorithm in PETSc, the Portable, Extensible Toolkit for Scientific
Computation, and added a convenient high-level interface into Firedrake, a
system for solving partial differential equations using finite element methods.
We evaluated our new implementation by saving and loading data involving 8.2
billion finite element degrees of freedom using 8,192 parallel processes on
ARCHER2, the UK National Supercomputing Service.