Fully parallel and pipelined sparse direct solver for large symmetric indefinite finite element problems

IF 2.9 2区 数学 Q1 MATHEMATICS, APPLIED
Yujie Wang , Shengquan Wang , Yong Cai , Guidong Wang , Guangyao Li
{"title":"Fully parallel and pipelined sparse direct solver for large symmetric indefinite finite element problems","authors":"Yujie Wang ,&nbsp;Shengquan Wang ,&nbsp;Yong Cai ,&nbsp;Guidong Wang ,&nbsp;Guangyao Li","doi":"10.1016/j.camwa.2024.10.017","DOIUrl":null,"url":null,"abstract":"<div><div>Sparse linear system solving is a primary computational cost in large-scale finite element analysis, and improving its performance is a key technological challenge in this field. Real-world engineering problems involve diverse materials, elements, and connectivity relationships, making it difficult for iterative methods to handle their global stiffness matrices. Direct methods, owing to their robustness, emerge as the preferred choice. In this paper, a novel block-based supernodal LDL<sup>T</sup> numerical factorization method is introduced. The computational process is disassembled into distinct tasks, and the dependency relationships between these tasks are expressed via a directed acyclic graph to guide the calculation sequence. Based on this approach, a global task pool and local task stack are established to store task queues, enhancing data reuse and multicore collaboration efficiency. Additionally, an effective task dispatch and work-stealing mechanism is implemented to prevent performance degradation caused by load imbalances. Numerical experiments, including a publicly available matrix test set and real-world engineering finite element problems, are conducted to compare the parallel performances of the Pardiso, MUMPS, and proposed solver. The results illustrate that the proposed solver performs significantly better than the other solvers when handling various types of sparse matrices and diverse architectures of multicore processors.</div></div>","PeriodicalId":55218,"journal":{"name":"Computers & Mathematics with Applications","volume":null,"pages":null},"PeriodicalIF":2.9000,"publicationDate":"2024-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computers & Mathematics with Applications","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0898122124004589","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MATHEMATICS, APPLIED","Score":null,"Total":0}
引用次数: 0

Abstract

Sparse linear system solving is a primary computational cost in large-scale finite element analysis, and improving its performance is a key technological challenge in this field. Real-world engineering problems involve diverse materials, elements, and connectivity relationships, making it difficult for iterative methods to handle their global stiffness matrices. Direct methods, owing to their robustness, emerge as the preferred choice. In this paper, a novel block-based supernodal LDLT numerical factorization method is introduced. The computational process is disassembled into distinct tasks, and the dependency relationships between these tasks are expressed via a directed acyclic graph to guide the calculation sequence. Based on this approach, a global task pool and local task stack are established to store task queues, enhancing data reuse and multicore collaboration efficiency. Additionally, an effective task dispatch and work-stealing mechanism is implemented to prevent performance degradation caused by load imbalances. Numerical experiments, including a publicly available matrix test set and real-world engineering finite element problems, are conducted to compare the parallel performances of the Pardiso, MUMPS, and proposed solver. The results illustrate that the proposed solver performs significantly better than the other solvers when handling various types of sparse matrices and diverse architectures of multicore processors.
大型对称不定期有限元问题的全并行流水线稀疏直接求解器
稀疏线性系统求解是大规模有限元分析的主要计算成本,提高其性能是这一领域的关键技术挑战。现实世界的工程问题涉及多种材料、元素和连接关系,因此迭代法很难处理其全局刚度矩阵。直接方法因其稳健性而成为首选。本文介绍了一种新颖的基于块的超节点 LDLT 数值因式分解方法。计算过程被分解成不同的任务,这些任务之间的依赖关系通过有向无环图来表达,以指导计算顺序。基于这种方法,建立了全局任务池和本地任务栈来存储任务队列,从而提高了数据重用和多核协作效率。此外,还实施了有效的任务调度和抢工机制,以防止负载不平衡导致的性能下降。为了比较 Pardiso、MUMPS 和建议的求解器的并行性能,我们进行了包括公开矩阵测试集和实际工程有限元问题在内的数值实验。结果表明,在处理各种类型的稀疏矩阵和不同架构的多核处理器时,建议的求解器的性能明显优于其他求解器。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Computers & Mathematics with Applications
Computers & Mathematics with Applications 工程技术-计算机:跨学科应用
CiteScore
5.10
自引率
10.30%
发文量
396
审稿时长
9.9 weeks
期刊介绍: Computers & Mathematics with Applications provides a medium of exchange for those engaged in fields contributing to building successful simulations for science and engineering using Partial Differential Equations (PDEs).
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信