$LU$ Factorization Algorithms on Distributed-Memory Multiprocessor Architectures

Siam Journal on Scientific and Statistical Computing Pub Date : 1988-07-01 DOI:10.1137/0909042

G. Geist, C. Romine

引用次数: 97

Abstract

In this paper, we consider the effect that the data-storage scheme and pivoting scheme have on the efficiency of $LU$ factorization on a distributed-memory multiprocessor. Our presentation will focus on the hypercube architecture, but most of our results are applicable to distributed-memory architectures in general. We restrict our attention to two commonly used storage schemes (storage by rows and by columns) and investigate partial pivoting both by rows and by columns, yielding four factorization algorithms. Our goal is to determine which of these four algorithms admits the most efficient parallel implementation. We analyze factors such as load distribution, pivoting cost, and potential for pipelining. We conclude that, in the absence of loop-unrolling, $LU$ factorization with partial pivoting is most efficient when pipelining is used to mask the cost of pivoting. The two schemes that can be pipelined are pivoting by interchanging rows when the coefficient matrix is distributed to the processors by columns, and pivoting by interchanging columns when the matrix is distributed to the processors by rows.

查看原文本刊更多论文

分布式内存多处理器体系结构的因子分解算法

在本文中，我们考虑了数据存储方案和旋转方案对分布式存储多处理器上LU分解效率的影响。我们的演示将重点关注超立方体体系结构，但我们的大多数结果通常适用于分布式内存体系结构。我们将注意力限制在两种常用的存储模式(按行存储和按列存储)上，并研究按行和按列的部分枢轴，从而产生四种分解算法。我们的目标是确定这四种算法中哪一种允许最有效的并行实现。我们分析了负荷分配、枢纽成本和管道输送潜力等因素。我们得出结论，在没有循环展开的情况下，当使用流水线来掩盖旋转的成本时，带有部分旋转的LU分解是最有效的。可以实现流水线化的两种方案是:系数矩阵按列分配给处理器时，通过交换行进行旋转;矩阵按行分配给处理器时，通过交换列进行旋转。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Siam Journal on Scientific and Statistical Computing

自引率

0.00%

发文量