Multi-threading and one-sided communication in parallel LU factorization

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07) Pub Date : 2007-11-16 DOI:10.1145/1362622.1362664

P. Husbands, K. Yelick

引用次数: 53

Abstract

Dense LU factorization has a high ratio of computation to communication and, as evidenced by the High Performance Linpack (HPL) benchmark, this property makes it scale well on most parallel machines. Nevertheless, the standard algorithm for this problem has non-trivial dependence patterns which limit parallelism, and local computations require large matrices in order to achieve good single processor performance. We present an alternative programming model for this type of problem, which combines UPC's global address space with lightweight multithreading. We introduce the concept of memory-constrained lookahead where the amount of concurrency managed by each processor is controlled by the amount of memory available. We implement novel techniques for steering the computation to optimize for high performance and demonstrate the scalability and portability of UPC with Teraflop level performance on some machines, comparing favourably to other state-of-the-art MPI codes.

查看原文本刊更多论文

并行逻辑分解中的多线程和单向通信

密集的LU分解具有很高的计算与通信比率，并且正如高性能Linpack (HPL)基准测试所证明的那样，该特性使其在大多数并行机器上都可以很好地扩展。然而，该问题的标准算法具有限制并行性的非平凡依赖模式，并且局部计算需要大矩阵才能获得良好的单处理器性能。针对这类问题，我们提出了一种可选的编程模型，该模型将UPC的全局地址空间与轻量级多线程相结合。我们引入了内存约束前瞻性的概念，其中每个处理器管理的并发数量由可用内存量控制。我们实现了新的技术来指导计算，以优化高性能，并展示了UPC的可扩展性和可移植性，在一些机器上具有Teraflop级别的性能，与其他最先进的MPI代码相比是有利的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07)

自引率

0.00%

发文量