Optimizing PLASMA Eigensolver on Large Shared Memory Systems

2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA) Pub Date : 2016-11-13 DOI:10.1109/SCALA.2016.14

Cheng Liao

引用次数: 0

Abstract

Performance of the PLASMA dense symmetric Eigensolver is optimized for large shared memory computer systems using multiple Householder domains for dense to band reduction and a communication reducing kernel for bulge chasing. The mr3-smp code by Petschow and Bientinesi is used for the tridiagonal eigensolution and the eigenvector back-transformations employ a 1D parallel decomposition. The input matrix, Householder vectors and scalars, are distributed among the CPU sockets with interleaved memory pages but the banded matrix, the eigenvectors, and temporary memory buffers are allocated and processed locally. Other considerations and optimization techniques also are presented. Numerical examples show the PLASMA eigensolver can out-perform ELPA and EIGENEXA significantly, for solving all the eigenpairs, if the problem size is sufficiently large, and the 2-stage eigensolution is generally better than its 1-stage counterpart on the latest x86_64 EP-4S CPUs with AVX2.

查看原文本刊更多论文

大型共享内存系统的等离子体特征求解优化

针对大型共享内存计算机系统，优化了等离子体密集对称特征解算器的性能，使用多个Householder域进行密集到频带缩减，使用一个通信缩减核进行凸块追踪。Petschow和Bientinesi的mr3-smp代码用于三对角线特征解，特征向量反向变换采用一维并行分解。输入矩阵、住户向量和标量分布在具有交错内存页的CPU插槽中，但带状矩阵、特征向量和临时内存缓冲区是在本地分配和处理的。还介绍了其他注意事项和优化技术。数值算例表明，当问题规模足够大时，等离子体特征求解器在求解所有特征对时的性能都明显优于ELPA和EIGENEXA，并且在最新的带有AVX2的x86_64 EP-4S cpu上，两阶段特征解通常优于一阶段特征解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)

自引率

0.00%

发文量