HTS: A Threaded Multilevel Sparse Hybrid Solver

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-05-01 DOI:10.1109/ipdps53621.2022.00010

J. Booth

{"title":"HTS: A Threaded Multilevel Sparse Hybrid Solver","authors":"J. Booth","doi":"10.1109/ipdps53621.2022.00010","DOIUrl":null,"url":null,"abstract":"Large shared-memory many-core nodes have become the norm in scientific computing, and therefore the sparse linear solver stack must adapt to the multilevel structure that exists on these nodes. One adaption is the development of hybrid-solvers at the node level. We present HTS as a hybrid threaded solver that aims to provide a finer-grain algorithm to keep an increased number of threads actively working on these larger shared-memory environments without the overheads of message passing implementations. Additionally, HTS aims at utilizing the additional shared memory that may be available to improve performance, i.e., reducing iteration counts when used as a preconditioner and speeding up calculations. HTS is built around the Schur complement framework that many other hybrid solver packages already use. However, HTS uses a multilevel structure in dealing with the Schur complement and allows for fill-in in certain off-diagonal submatrices to allow for a faster and more accurate solve phase. These modifications allow for a tasking thread library, namely Cilk, to be used to speed up performance while still reducing peak memory by more than 20% on average compared to an optimized direct factorization method. We show that HTS can outperform the MPI-based hybrid solver ShyLU on a suite of sparse matrices by as much as 2×, and show that HTS can scale well on three-dimensional finite difference problems.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps53621.2022.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Large shared-memory many-core nodes have become the norm in scientific computing, and therefore the sparse linear solver stack must adapt to the multilevel structure that exists on these nodes. One adaption is the development of hybrid-solvers at the node level. We present HTS as a hybrid threaded solver that aims to provide a finer-grain algorithm to keep an increased number of threads actively working on these larger shared-memory environments without the overheads of message passing implementations. Additionally, HTS aims at utilizing the additional shared memory that may be available to improve performance, i.e., reducing iteration counts when used as a preconditioner and speeding up calculations. HTS is built around the Schur complement framework that many other hybrid solver packages already use. However, HTS uses a multilevel structure in dealing with the Schur complement and allows for fill-in in certain off-diagonal submatrices to allow for a faster and more accurate solve phase. These modifications allow for a tasking thread library, namely Cilk, to be used to speed up performance while still reducing peak memory by more than 20% on average compared to an optimized direct factorization method. We show that HTS can outperform the MPI-based hybrid solver ShyLU on a suite of sparse matrices by as much as 2×, and show that HTS can scale well on three-dimensional finite difference problems.

查看原文本刊更多论文

HTS:一种多线程多级稀疏混合求解器

大共享内存多核节点已成为科学计算的常态，因此稀疏线性求解器堆栈必须适应这些节点上存在的多层结构。一种适应是在节点级别开发混合求解器。我们将HTS作为一种混合线程求解器，旨在提供一种更细粒度的算法，以保持在这些更大的共享内存环境中活跃工作的线程数量的增加，而不会带来消息传递实现的开销。此外，HTS旨在利用可能可用的额外共享内存来提高性能，即，当用作前置条件时减少迭代计数并加快计算速度。HTS是围绕许多其他混合求解器软件包已经使用的Schur互补框架构建的。然而，HTS在处理Schur补时使用多层结构，并允许在某些非对角线子矩阵中填充，以允许更快和更准确的求解阶段。这些修改允许使用任务线程库(即Cilk)来提高性能，同时与优化的直接分解方法相比，峰值内存平均减少20%以上。我们证明了HTS在一组稀疏矩阵上的性能比基于mpi的混合求解器ShyLU高出2倍，并且表明HTS在三维有限差分问题上可以很好地扩展。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量