Optimizing PLASMA Eigensolver on Large Shared Memory Systems

Cheng Liao
{"title":"Optimizing PLASMA Eigensolver on Large Shared Memory Systems","authors":"Cheng Liao","doi":"10.1109/SCALA.2016.14","DOIUrl":null,"url":null,"abstract":"Performance of the PLASMA dense symmetric Eigensolver is optimized for large shared memory computer systems using multiple Householder domains for dense to band reduction and a communication reducing kernel for bulge chasing. The mr3-smp code by Petschow and Bientinesi is used for the tridiagonal eigensolution and the eigenvector back-transformations employ a 1D parallel decomposition. The input matrix, Householder vectors and scalars, are distributed among the CPU sockets with interleaved memory pages but the banded matrix, the eigenvectors, and temporary memory buffers are allocated and processed locally. Other considerations and optimization techniques also are presented. Numerical examples show the PLASMA eigensolver can out-perform ELPA and EIGENEXA significantly, for solving all the eigenpairs, if the problem size is sufficiently large, and the 2-stage eigensolution is generally better than its 1-stage counterpart on the latest x86_64 EP-4S CPUs with AVX2.","PeriodicalId":410521,"journal":{"name":"2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 7th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SCALA.2016.14","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Performance of the PLASMA dense symmetric Eigensolver is optimized for large shared memory computer systems using multiple Householder domains for dense to band reduction and a communication reducing kernel for bulge chasing. The mr3-smp code by Petschow and Bientinesi is used for the tridiagonal eigensolution and the eigenvector back-transformations employ a 1D parallel decomposition. The input matrix, Householder vectors and scalars, are distributed among the CPU sockets with interleaved memory pages but the banded matrix, the eigenvectors, and temporary memory buffers are allocated and processed locally. Other considerations and optimization techniques also are presented. Numerical examples show the PLASMA eigensolver can out-perform ELPA and EIGENEXA significantly, for solving all the eigenpairs, if the problem size is sufficiently large, and the 2-stage eigensolution is generally better than its 1-stage counterpart on the latest x86_64 EP-4S CPUs with AVX2.
大型共享内存系统的等离子体特征求解优化
针对大型共享内存计算机系统,优化了等离子体密集对称特征解算器的性能,使用多个Householder域进行密集到频带缩减,使用一个通信缩减核进行凸块追踪。Petschow和Bientinesi的mr3-smp代码用于三对角线特征解,特征向量反向变换采用一维并行分解。输入矩阵、住户向量和标量分布在具有交错内存页的CPU插槽中,但带状矩阵、特征向量和临时内存缓冲区是在本地分配和处理的。还介绍了其他注意事项和优化技术。数值算例表明,当问题规模足够大时,等离子体特征求解器在求解所有特征对时的性能都明显优于ELPA和EIGENEXA,并且在最新的带有AVX2的x86_64 EP-4S cpu上,两阶段特征解通常优于一阶段特征解。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信