Fully parallel and pipelined sparse direct solver for large symmetric indefinite finite element problems

IF 2.9 2区数学 Q1 MATHEMATICS, APPLIED

Computers & Mathematics with Applications Pub Date : 2024-10-24 DOI:10.1016/j.camwa.2024.10.017

Yujie Wang , Shengquan Wang , Yong Cai , Guidong Wang , Guangyao Li

{"title":"Fully parallel and pipelined sparse direct solver for large symmetric indefinite finite element problems","authors":"Yujie Wang , Shengquan Wang , Yong Cai , Guidong Wang , Guangyao Li","doi":"10.1016/j.camwa.2024.10.017","DOIUrl":null,"url":null,"abstract":"<div><div>Sparse linear system solving is a primary computational cost in large-scale finite element analysis, and improving its performance is a key technological challenge in this field. Real-world engineering problems involve diverse materials, elements, and connectivity relationships, making it difficult for iterative methods to handle their global stiffness matrices. Direct methods, owing to their robustness, emerge as the preferred choice. In this paper, a novel block-based supernodal LDL<sup>T</sup> numerical factorization method is introduced. The computational process is disassembled into distinct tasks, and the dependency relationships between these tasks are expressed via a directed acyclic graph to guide the calculation sequence. Based on this approach, a global task pool and local task stack are established to store task queues, enhancing data reuse and multicore collaboration efficiency. Additionally, an effective task dispatch and work-stealing mechanism is implemented to prevent performance degradation caused by load imbalances. Numerical experiments, including a publicly available matrix test set and real-world engineering finite element problems, are conducted to compare the parallel performances of the Pardiso, MUMPS, and proposed solver. The results illustrate that the proposed solver performs significantly better than the other solvers when handling various types of sparse matrices and diverse architectures of multicore processors.</div></div>","PeriodicalId":55218,"journal":{"name":"Computers & Mathematics with Applications","volume":"175 ","pages":"Pages 447-469"},"PeriodicalIF":2.9000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Mathematics with Applications","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0898122124004589","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}

引用次数: 0

Abstract

Sparse linear system solving is a primary computational cost in large-scale finite element analysis, and improving its performance is a key technological challenge in this field. Real-world engineering problems involve diverse materials, elements, and connectivity relationships, making it difficult for iterative methods to handle their global stiffness matrices. Direct methods, owing to their robustness, emerge as the preferred choice. In this paper, a novel block-based supernodal LDL^T numerical factorization method is introduced. The computational process is disassembled into distinct tasks, and the dependency relationships between these tasks are expressed via a directed acyclic graph to guide the calculation sequence. Based on this approach, a global task pool and local task stack are established to store task queues, enhancing data reuse and multicore collaboration efficiency. Additionally, an effective task dispatch and work-stealing mechanism is implemented to prevent performance degradation caused by load imbalances. Numerical experiments, including a publicly available matrix test set and real-world engineering finite element problems, are conducted to compare the parallel performances of the Pardiso, MUMPS, and proposed solver. The results illustrate that the proposed solver performs significantly better than the other solvers when handling various types of sparse matrices and diverse architectures of multicore processors.

查看原文本刊更多论文

大型对称不定期有限元问题的全并行流水线稀疏直接求解器

稀疏线性系统求解是大规模有限元分析的主要计算成本，提高其性能是这一领域的关键技术挑战。现实世界的工程问题涉及多种材料、元素和连接关系，因此迭代法很难处理其全局刚度矩阵。直接方法因其稳健性而成为首选。本文介绍了一种新颖的基于块的超节点 LDLT 数值因式分解方法。计算过程被分解成不同的任务，这些任务之间的依赖关系通过有向无环图来表达，以指导计算顺序。基于这种方法，建立了全局任务池和本地任务栈来存储任务队列，从而提高了数据重用和多核协作效率。此外，还实施了有效的任务调度和抢工机制，以防止负载不平衡导致的性能下降。为了比较 Pardiso、MUMPS 和建议的求解器的并行性能，我们进行了包括公开矩阵测试集和实际工程有限元问题在内的数值实验。结果表明，在处理各种类型的稀疏矩阵和不同架构的多核处理器时，建议的求解器的性能明显优于其他求解器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Computers & Mathematics with Applications 工程技术-计算机：跨学科应用

CiteScore

5.10

自引率

10.30%

发文量

396

审稿时长

9.9 weeks

期刊介绍： Computers & Mathematics with Applications provides a medium of exchange for those engaged in fields contributing to building successful simulations for science and engineering using Partial Differential Equations (PDEs).