基于运行时系统的结构化密集矩阵的O(N)分布直接分解

Sameer Deshmukh, Qinxiang Ma, Rio Yokota, George Bosilca
{"title":"基于运行时系统的结构化密集矩阵的O(N)分布直接分解","authors":"Sameer Deshmukh, Qinxiang Ma, Rio Yokota, George Bosilca","doi":"arxiv-2311.00921","DOIUrl":null,"url":null,"abstract":"Structured dense matrices result from boundary integral problems in\nelectrostatics and geostatistics, and also Schur complements in sparse\npreconditioners such as multi-frontal methods. Exploiting the structure of such\nmatrices can reduce the time for dense direct factorization from $O(N^3)$ to\n$O(N)$. The Hierarchically Semi-Separable (HSS) matrix is one such low rank\nmatrix format that can be factorized using a Cholesky-like algorithm called ULV\nfactorization. The HSS-ULV algorithm is highly parallel because it removes the\ndependency on trailing sub-matrices at each HSS level. However, a key merge\nstep that links two successive HSS levels remains a challenge for efficient\nparallelization. In this paper, we use an asynchronous runtime system PaRSEC\nwith the HSS-ULV algorithm. We compare our work with STRUMPACK and LORAPO, both\nstate-of-the-art implementations of dense direct low rank factorization, and\nachieve up to 2x better factorization time for matrices arising from a diverse\nset of applications on up to 128 nodes of Fugaku for similar or better accuracy\nfor all the problems that we survey.","PeriodicalId":501256,"journal":{"name":"arXiv - CS - Mathematical Software","volume":"13 2","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"$O(N)$ distributed direct factorization of structured dense matrices using runtime systems\",\"authors\":\"Sameer Deshmukh, Qinxiang Ma, Rio Yokota, George Bosilca\",\"doi\":\"arxiv-2311.00921\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Structured dense matrices result from boundary integral problems in\\nelectrostatics and geostatistics, and also Schur complements in sparse\\npreconditioners such as multi-frontal methods. Exploiting the structure of such\\nmatrices can reduce the time for dense direct factorization from $O(N^3)$ to\\n$O(N)$. The Hierarchically Semi-Separable (HSS) matrix is one such low rank\\nmatrix format that can be factorized using a Cholesky-like algorithm called ULV\\nfactorization. The HSS-ULV algorithm is highly parallel because it removes the\\ndependency on trailing sub-matrices at each HSS level. However, a key merge\\nstep that links two successive HSS levels remains a challenge for efficient\\nparallelization. In this paper, we use an asynchronous runtime system PaRSEC\\nwith the HSS-ULV algorithm. We compare our work with STRUMPACK and LORAPO, both\\nstate-of-the-art implementations of dense direct low rank factorization, and\\nachieve up to 2x better factorization time for matrices arising from a diverse\\nset of applications on up to 128 nodes of Fugaku for similar or better accuracy\\nfor all the problems that we survey.\",\"PeriodicalId\":501256,\"journal\":{\"name\":\"arXiv - CS - Mathematical Software\",\"volume\":\"13 2\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Mathematical Software\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2311.00921\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Mathematical Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2311.00921","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

结构密集矩阵来源于静电学和地统计学中的边界积分问题,也来源于稀疏预处理中的Schur互补,如多正面方法。利用这种矩阵的结构可以将密集直接分解的时间从$O(N^3)$减少到$O(N)$。层次半可分(HSS)矩阵就是这样一种低秩矩阵格式,可以使用称为ULVfactorization的类cholesky算法进行分解。HSS- ulv算法是高度并行的,因为它消除了对每个HSS级别的尾子矩阵的依赖。然而,连接两个连续HSS级别的关键合并步骤仍然是有效并行化的挑战。在本文中,我们使用了一个异步运行时系统parsecs与HSS-ULV算法。我们将我们的工作与STRUMPACK和LORAPO进行了比较,两者都是最先进的密集直接低秩分解实现,并且在多达128个Fugaku节点上对各种应用产生的矩阵实现了高达2倍的分解时间,对于我们调查的所有问题具有相似或更好的准确性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
$O(N)$ distributed direct factorization of structured dense matrices using runtime systems
Structured dense matrices result from boundary integral problems in electrostatics and geostatistics, and also Schur complements in sparse preconditioners such as multi-frontal methods. Exploiting the structure of such matrices can reduce the time for dense direct factorization from $O(N^3)$ to $O(N)$. The Hierarchically Semi-Separable (HSS) matrix is one such low rank matrix format that can be factorized using a Cholesky-like algorithm called ULV factorization. The HSS-ULV algorithm is highly parallel because it removes the dependency on trailing sub-matrices at each HSS level. However, a key merge step that links two successive HSS levels remains a challenge for efficient parallelization. In this paper, we use an asynchronous runtime system PaRSEC with the HSS-ULV algorithm. We compare our work with STRUMPACK and LORAPO, both state-of-the-art implementations of dense direct low rank factorization, and achieve up to 2x better factorization time for matrices arising from a diverse set of applications on up to 128 nodes of Fugaku for similar or better accuracy for all the problems that we survey.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信