HTS: A Threaded Multilevel Sparse Hybrid Solver

J. Booth
{"title":"HTS: A Threaded Multilevel Sparse Hybrid Solver","authors":"J. Booth","doi":"10.1109/ipdps53621.2022.00010","DOIUrl":null,"url":null,"abstract":"Large shared-memory many-core nodes have become the norm in scientific computing, and therefore the sparse linear solver stack must adapt to the multilevel structure that exists on these nodes. One adaption is the development of hybrid-solvers at the node level. We present HTS as a hybrid threaded solver that aims to provide a finer-grain algorithm to keep an increased number of threads actively working on these larger shared-memory environments without the overheads of message passing implementations. Additionally, HTS aims at utilizing the additional shared memory that may be available to improve performance, i.e., reducing iteration counts when used as a preconditioner and speeding up calculations. HTS is built around the Schur complement framework that many other hybrid solver packages already use. However, HTS uses a multilevel structure in dealing with the Schur complement and allows for fill-in in certain off-diagonal submatrices to allow for a faster and more accurate solve phase. These modifications allow for a tasking thread library, namely Cilk, to be used to speed up performance while still reducing peak memory by more than 20% on average compared to an optimized direct factorization method. We show that HTS can outperform the MPI-based hybrid solver ShyLU on a suite of sparse matrices by as much as 2×, and show that HTS can scale well on three-dimensional finite difference problems.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"110 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps53621.2022.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Large shared-memory many-core nodes have become the norm in scientific computing, and therefore the sparse linear solver stack must adapt to the multilevel structure that exists on these nodes. One adaption is the development of hybrid-solvers at the node level. We present HTS as a hybrid threaded solver that aims to provide a finer-grain algorithm to keep an increased number of threads actively working on these larger shared-memory environments without the overheads of message passing implementations. Additionally, HTS aims at utilizing the additional shared memory that may be available to improve performance, i.e., reducing iteration counts when used as a preconditioner and speeding up calculations. HTS is built around the Schur complement framework that many other hybrid solver packages already use. However, HTS uses a multilevel structure in dealing with the Schur complement and allows for fill-in in certain off-diagonal submatrices to allow for a faster and more accurate solve phase. These modifications allow for a tasking thread library, namely Cilk, to be used to speed up performance while still reducing peak memory by more than 20% on average compared to an optimized direct factorization method. We show that HTS can outperform the MPI-based hybrid solver ShyLU on a suite of sparse matrices by as much as 2×, and show that HTS can scale well on three-dimensional finite difference problems.
HTS:一种多线程多级稀疏混合求解器
大共享内存多核节点已成为科学计算的常态,因此稀疏线性求解器堆栈必须适应这些节点上存在的多层结构。一种适应是在节点级别开发混合求解器。我们将HTS作为一种混合线程求解器,旨在提供一种更细粒度的算法,以保持在这些更大的共享内存环境中活跃工作的线程数量的增加,而不会带来消息传递实现的开销。此外,HTS旨在利用可能可用的额外共享内存来提高性能,即,当用作前置条件时减少迭代计数并加快计算速度。HTS是围绕许多其他混合求解器软件包已经使用的Schur互补框架构建的。然而,HTS在处理Schur补时使用多层结构,并允许在某些非对角线子矩阵中填充,以允许更快和更准确的求解阶段。这些修改允许使用任务线程库(即Cilk)来提高性能,同时与优化的直接分解方法相比,峰值内存平均减少20%以上。我们证明了HTS在一组稀疏矩阵上的性能比基于mpi的混合求解器ShyLU高出2倍,并且表明HTS在三维有限差分问题上可以很好地扩展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信