分布式内存多处理器体系结构的因子分解算法

G. Geist, C. Romine
{"title":"分布式内存多处理器体系结构的因子分解算法","authors":"G. Geist, C. Romine","doi":"10.1137/0909042","DOIUrl":null,"url":null,"abstract":"In this paper, we consider the effect that the data-storage scheme and pivoting scheme have on the efficiency of $LU$ factorization on a distributed-memory multiprocessor. Our presentation will focus on the hypercube architecture, but most of our results are applicable to distributed-memory architectures in general. We restrict our attention to two commonly used storage schemes (storage by rows and by columns) and investigate partial pivoting both by rows and by columns, yielding four factorization algorithms. Our goal is to determine which of these four algorithms admits the most efficient parallel implementation. We analyze factors such as load distribution, pivoting cost, and potential for pipelining. We conclude that, in the absence of loop-unrolling, $LU$ factorization with partial pivoting is most efficient when pipelining is used to mask the cost of pivoting. The two schemes that can be pipelined are pivoting by interchanging rows when the coefficient matrix is distributed to the processors by columns, and pivoting by interchanging columns when the matrix is distributed to the processors by rows.","PeriodicalId":200176,"journal":{"name":"Siam Journal on Scientific and Statistical Computing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1988-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"97","resultStr":"{\"title\":\"$LU$ Factorization Algorithms on Distributed-Memory Multiprocessor Architectures\",\"authors\":\"G. Geist, C. Romine\",\"doi\":\"10.1137/0909042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we consider the effect that the data-storage scheme and pivoting scheme have on the efficiency of $LU$ factorization on a distributed-memory multiprocessor. Our presentation will focus on the hypercube architecture, but most of our results are applicable to distributed-memory architectures in general. We restrict our attention to two commonly used storage schemes (storage by rows and by columns) and investigate partial pivoting both by rows and by columns, yielding four factorization algorithms. Our goal is to determine which of these four algorithms admits the most efficient parallel implementation. We analyze factors such as load distribution, pivoting cost, and potential for pipelining. We conclude that, in the absence of loop-unrolling, $LU$ factorization with partial pivoting is most efficient when pipelining is used to mask the cost of pivoting. The two schemes that can be pipelined are pivoting by interchanging rows when the coefficient matrix is distributed to the processors by columns, and pivoting by interchanging columns when the matrix is distributed to the processors by rows.\",\"PeriodicalId\":200176,\"journal\":{\"name\":\"Siam Journal on Scientific and Statistical Computing\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1988-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"97\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Siam Journal on Scientific and Statistical Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1137/0909042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Siam Journal on Scientific and Statistical Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/0909042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 97

摘要

在本文中,我们考虑了数据存储方案和旋转方案对分布式存储多处理器上LU分解效率的影响。我们的演示将重点关注超立方体体系结构,但我们的大多数结果通常适用于分布式内存体系结构。我们将注意力限制在两种常用的存储模式(按行存储和按列存储)上,并研究按行和按列的部分枢轴,从而产生四种分解算法。我们的目标是确定这四种算法中哪一种允许最有效的并行实现。我们分析了负荷分配、枢纽成本和管道输送潜力等因素。我们得出结论,在没有循环展开的情况下,当使用流水线来掩盖旋转的成本时,带有部分旋转的LU分解是最有效的。可以实现流水线化的两种方案是:系数矩阵按列分配给处理器时,通过交换行进行旋转;矩阵按行分配给处理器时,通过交换列进行旋转。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
$LU$ Factorization Algorithms on Distributed-Memory Multiprocessor Architectures
In this paper, we consider the effect that the data-storage scheme and pivoting scheme have on the efficiency of $LU$ factorization on a distributed-memory multiprocessor. Our presentation will focus on the hypercube architecture, but most of our results are applicable to distributed-memory architectures in general. We restrict our attention to two commonly used storage schemes (storage by rows and by columns) and investigate partial pivoting both by rows and by columns, yielding four factorization algorithms. Our goal is to determine which of these four algorithms admits the most efficient parallel implementation. We analyze factors such as load distribution, pivoting cost, and potential for pipelining. We conclude that, in the absence of loop-unrolling, $LU$ factorization with partial pivoting is most efficient when pipelining is used to mask the cost of pivoting. The two schemes that can be pipelined are pivoting by interchanging rows when the coefficient matrix is distributed to the processors by columns, and pivoting by interchanging columns when the matrix is distributed to the processors by rows.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信