An investigation into the impact of the structured QR kernel on the overall performance of the TSQR algorithm

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2019-01-14 DOI:10.1145/3293320.3293327

Takeshi Fukaya

引用次数: 0

Abstract

The TSQR algorithm is a communication-avoiding algorithm for computing the QR factorization of a tall and skinny (TS) matrix. The TSQR algorithm entails repeatedly executing a kernel that computes the QR factorization of a structured matrix. Although a single execution of structured QR requires small computational cost, it is repeated depending on the number of active parallel processes. The complicated computational pattern and small matrix size of structured QR are obstacles to achieving high performance. Thus, the computational cost of structured QR becomes a significant bottleneck in massively parallel computation. In this paper, we focus on the kernel of structured QR and discuss its implementation. We compare several kernels including those provided in LAPACK on modern processors, and investigate the impact of the different structured QR kernels on the overall performance of the TSQR algorithm.

查看原文本刊更多论文

研究了结构化QR核对TSQR算法整体性能的影响

TSQR算法是一种避免通信的算法，用于计算高瘦(TS)矩阵的QR分解。TSQR算法需要重复执行计算结构化矩阵QR分解的内核。虽然结构化QR的单次执行需要很小的计算成本，但它是重复的，这取决于活动并行进程的数量。复杂的计算模式和较小的矩阵尺寸是结构化QR实现高性能的障碍。因此，结构化QR的计算成本成为大规模并行计算的一个重要瓶颈。本文重点研究了结构化QR的核心，并对其实现进行了讨论。我们比较了几种内核，包括在现代处理器上提供的LAPACK内核，并研究了不同结构的QR内核对TSQR算法整体性能的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

自引率

0.00%

发文量