End-to-end GPU acceleration of low-order-refined preconditioning for high-order finite element discretizations

IF 2.5 3区计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

International Journal of High Performance Computing Applications Pub Date : 2022-10-21 DOI:10.1177/10943420231175462

Will Pazner, T. Kolev, Jean-Sylvain Camier

{"title":"End-to-end GPU acceleration of low-order-refined preconditioning for high-order finite element discretizations","authors":"Will Pazner, T. Kolev, Jean-Sylvain Camier","doi":"10.1177/10943420231175462","DOIUrl":null,"url":null,"abstract":"In this article, we present algorithms and implementations for the end-to-end GPU acceleration of matrix-free low-order-refined preconditioning of high-order finite element problems. The methods described here allow for the construction of effective preconditioners for high-order problems with optimal memory usage and computational complexity. The preconditioners are based on the construction of a spectrally equivalent low-order discretization on a refined mesh, which is then amenable to, for example, algebraic multigrid preconditioning. The constants of equivalence are independent of mesh size and polynomial degree. For vector finite element problems in H (curl) and H (div) (e.g., for electromagnetic or radiation diffusion problems), a specially constructed interpolation–histopolation basis is used to ensure fast convergence. Detailed performance studies are carried out to analyze the efficiency of the GPU algorithms. The kernel throughput of each of the main algorithmic components is measured, and the strong and weak parallel scalability of the methods is demonstrated. The different relative weighting and significance of the algorithmic components on GPUs and CPUs is discussed. Results on problems involving adaptively refined nonconforming meshes are shown, and the use of the preconditioners on a large-scale magnetic diffusion problem using all spaces of the finite element de Rham complex is illustrated.","PeriodicalId":54957,"journal":{"name":"International Journal of High Performance Computing Applications","volume":"37 1","pages":"578 - 599"},"PeriodicalIF":2.5000,"publicationDate":"2022-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of High Performance Computing Applications","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1177/10943420231175462","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 1

Abstract

In this article, we present algorithms and implementations for the end-to-end GPU acceleration of matrix-free low-order-refined preconditioning of high-order finite element problems. The methods described here allow for the construction of effective preconditioners for high-order problems with optimal memory usage and computational complexity. The preconditioners are based on the construction of a spectrally equivalent low-order discretization on a refined mesh, which is then amenable to, for example, algebraic multigrid preconditioning. The constants of equivalence are independent of mesh size and polynomial degree. For vector finite element problems in H (curl) and H (div) (e.g., for electromagnetic or radiation diffusion problems), a specially constructed interpolation–histopolation basis is used to ensure fast convergence. Detailed performance studies are carried out to analyze the efficiency of the GPU algorithms. The kernel throughput of each of the main algorithmic components is measured, and the strong and weak parallel scalability of the methods is demonstrated. The different relative weighting and significance of the algorithmic components on GPUs and CPUs is discussed. Results on problems involving adaptively refined nonconforming meshes are shown, and the use of the preconditioners on a large-scale magnetic diffusion problem using all spaces of the finite element de Rham complex is illustrated.

查看原文本刊更多论文

高阶有限元离散化低阶精细预处理的端到端GPU加速

在本文中，我们提出了高阶有限元问题的无矩阵低阶精细预处理的端到端GPU加速的算法和实现。这里描述的方法允许为具有最佳内存使用和计算复杂性的高阶问题构造有效的预处理器。预处理器基于在精细网格上构造频谱等效的低阶离散化，然后适用于例如代数多重网格预处理。等效常数与网格大小和多项式次数无关。对于H（旋度）和H（div）中的向量有限元问题（例如，对于电磁或辐射扩散问题），使用特殊构造的插值-组织插值基础来确保快速收敛。进行了详细的性能研究，以分析GPU算法的效率。测量了每个主要算法组件的内核吞吐量，并证明了这些方法的强和弱并行可扩展性。讨论了GPU和CPU上算法组件的不同相对权重和重要性。给出了涉及自适应精细非协调网格的问题的结果，并说明了预处理器在使用有限元de Rham复形的所有空间的大规模磁扩散问题上的使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal of High Performance Computing Applications 工程技术-计算机：跨学科应用

CiteScore

6.10

自引率

6.50%

发文量

审稿时长

>12 weeks

期刊介绍： With ever increasing pressure for health services in all countries to meet rising demands, improve their quality and efficiency, and to be more accountable; the need for rigorous research and policy analysis has never been greater. The Journal of Health Services Research & Policy presents the latest scientific research, insightful overviews and reflections on underlying issues, and innovative, thought provoking contributions from leading academics and policy-makers. It provides ideas and hope for solving dilemmas that confront all countries.