分布式内存多处理器体系结构的因子分解算法

Siam Journal on Scientific and Statistical Computing Pub Date : 1988-07-01 DOI:10.1137/0909042

G. Geist, C. Romine

{"title":"分布式内存多处理器体系结构的因子分解算法","authors":"G. Geist, C. Romine","doi":"10.1137/0909042","DOIUrl":null,"url":null,"abstract":"In this paper, we consider the effect that the data-storage scheme and pivoting scheme have on the efficiency of $LU$ factorization on a distributed-memory multiprocessor. Our presentation will focus on the hypercube architecture, but most of our results are applicable to distributed-memory architectures in general. We restrict our attention to two commonly used storage schemes (storage by rows and by columns) and investigate partial pivoting both by rows and by columns, yielding four factorization algorithms. Our goal is to determine which of these four algorithms admits the most efficient parallel implementation. We analyze factors such as load distribution, pivoting cost, and potential for pipelining. We conclude that, in the absence of loop-unrolling, $LU$ factorization with partial pivoting is most efficient when pipelining is used to mask the cost of pivoting. The two schemes that can be pipelined are pivoting by interchanging rows when the coefficient matrix is distributed to the processors by columns, and pivoting by interchanging columns when the matrix is distributed to the processors by rows.","PeriodicalId":200176,"journal":{"name":"Siam Journal on Scientific and Statistical Computing","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1988-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"97","resultStr":"{\"title\":\"$LU$ Factorization Algorithms on Distributed-Memory Multiprocessor Architectures\",\"authors\":\"G. Geist, C. Romine\",\"doi\":\"10.1137/0909042\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we consider the effect that the data-storage scheme and pivoting scheme have on the efficiency of $LU$ factorization on a distributed-memory multiprocessor. Our presentation will focus on the hypercube architecture, but most of our results are applicable to distributed-memory architectures in general. We restrict our attention to two commonly used storage schemes (storage by rows and by columns) and investigate partial pivoting both by rows and by columns, yielding four factorization algorithms. Our goal is to determine which of these four algorithms admits the most efficient parallel implementation. We analyze factors such as load distribution, pivoting cost, and potential for pipelining. We conclude that, in the absence of loop-unrolling, $LU$ factorization with partial pivoting is most efficient when pipelining is used to mask the cost of pivoting. The two schemes that can be pipelined are pivoting by interchanging rows when the coefficient matrix is distributed to the processors by columns, and pivoting by interchanging columns when the matrix is distributed to the processors by rows.\",\"PeriodicalId\":200176,\"journal\":{\"name\":\"Siam Journal on Scientific and Statistical Computing\",\"volume\":\"13 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1988-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"97\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Siam Journal on Scientific and Statistical Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1137/0909042\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Siam Journal on Scientific and Statistical Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1137/0909042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 97

摘要

在本文中，我们考虑了数据存储方案和旋转方案对分布式存储多处理器上LU分解效率的影响。我们的演示将重点关注超立方体体系结构，但我们的大多数结果通常适用于分布式内存体系结构。我们将注意力限制在两种常用的存储模式(按行存储和按列存储)上，并研究按行和按列的部分枢轴，从而产生四种分解算法。我们的目标是确定这四种算法中哪一种允许最有效的并行实现。我们分析了负荷分配、枢纽成本和管道输送潜力等因素。我们得出结论，在没有循环展开的情况下，当使用流水线来掩盖旋转的成本时，带有部分旋转的LU分解是最有效的。可以实现流水线化的两种方案是:系数矩阵按列分配给处理器时，通过交换行进行旋转;矩阵按行分配给处理器时，通过交换列进行旋转。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

$LU$ Factorization Algorithms on Distributed-Memory Multiprocessor Architectures

In this paper, we consider the effect that the data-storage scheme and pivoting scheme have on the efficiency of $LU$ factorization on a distributed-memory multiprocessor. Our presentation will focus on the hypercube architecture, but most of our results are applicable to distributed-memory architectures in general. We restrict our attention to two commonly used storage schemes (storage by rows and by columns) and investigate partial pivoting both by rows and by columns, yielding four factorization algorithms. Our goal is to determine which of these four algorithms admits the most efficient parallel implementation. We analyze factors such as load distribution, pivoting cost, and potential for pipelining. We conclude that, in the absence of loop-unrolling, $LU$ factorization with partial pivoting is most efficient when pipelining is used to mask the cost of pivoting. The two schemes that can be pipelined are pivoting by interchanging rows when the coefficient matrix is distributed to the processors by columns, and pivoting by interchanging columns when the matrix is distributed to the processors by rows.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Siam Journal on Scientific and Statistical Computing

自引率

0.00%

发文量