Achieving Scalable Parallelization for the Hessenberg Factorization

2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI:10.1109/CLUSTER.2011.16

A. Castaldo, R. Clint Whaley

引用次数: 9

Abstract

Much of dense linear algebra has been successfully blocked to concentrate the majority of its time in the Level~3 BLAS, which are not only efficient for serial computation, but also scale well for parallelism. For the Hessenberg factorization, which is a critical step in computing the eigenvalues and vectors, however, performance of the best known algorithm is still strongly limited by the memory speed, which does not tend to scale well at all. In this paper we present an adaptation of our Parallel Cache Assignment (PCA) technique to the Hessenberg factorization, and show that it achieves super linear speedup over the corresponding serial algorithm and a more than four-fold speedup over the best known algorithm for small and medium sized problems.

查看原文本刊更多论文

实现海森伯格分解的可伸缩并行化

许多密集线性代数已经成功地将其大部分时间集中在3级BLAS上，这不仅对串行计算有效，而且对并行计算也有很好的扩展性。然而，对于计算特征值和向量的关键步骤Hessenberg分解，最著名的算法的性能仍然受到内存速度的强烈限制，这根本无法很好地扩展。在本文中，我们提出了一种将并行缓存分配(PCA)技术应用于Hessenberg分解的方法，并表明它比相应的串行算法实现了超线性加速，并且在中小型问题上比最著名的算法实现了四倍以上的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE International Conference on Cluster Computing

自引率

0.00%

发文量