Efficient Parallel Multigrid Methods on Manycore Clusters with Double/Single Precision Computing

K. Nakajima, T. Ogita, Masatoshi Kawai
{"title":"Efficient Parallel Multigrid Methods on Manycore Clusters with Double/Single Precision Computing","authors":"K. Nakajima, T. Ogita, Masatoshi Kawai","doi":"10.1109/IPDPSW52791.2021.00114","DOIUrl":null,"url":null,"abstract":"The parallel multigrid method is expected to play an important role in scientific computing on exa-scale supercomputer systems for solving large-scale linear equations with sparse coefficient matrices. Because solving sparse linear systems is a very memory-bound process, efficient method for storage of coefficient matrices is a crucial issue. In the previous works, authors implemented sliced ELL method to parallel conjugate gradient solvers with multigrid preconditioning (MGCG) for the application on 3D groundwater flow through heterogeneous porous media (pGW3D-LVM), and excellent performance has been obtained on large-scale multicore/manycore clusters. In the present work, authors introduced SELL-C-σ with double/single precision computing to the MGCG solver, and evaluated the performance of the solver with OpenMP/MPI hybrid parallel programing models on the Oakforest-PACS (OLP) system at JCAHPC using up to 2,048 nodes of Intel Xeon Phi. Because SELL-C-σ is suitable for wide-SIMD architecture, such as Xeon Phi, improvement of the performance over the sliced ELL was more than 35% for double precision and more than 45% for single precision on OFP. Finally, accuracy verification was conducted based on the method proposed by authors for solving linear equations with sparse coefficient matrices with M-property.","PeriodicalId":170832,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW52791.2021.00114","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The parallel multigrid method is expected to play an important role in scientific computing on exa-scale supercomputer systems for solving large-scale linear equations with sparse coefficient matrices. Because solving sparse linear systems is a very memory-bound process, efficient method for storage of coefficient matrices is a crucial issue. In the previous works, authors implemented sliced ELL method to parallel conjugate gradient solvers with multigrid preconditioning (MGCG) for the application on 3D groundwater flow through heterogeneous porous media (pGW3D-LVM), and excellent performance has been obtained on large-scale multicore/manycore clusters. In the present work, authors introduced SELL-C-σ with double/single precision computing to the MGCG solver, and evaluated the performance of the solver with OpenMP/MPI hybrid parallel programing models on the Oakforest-PACS (OLP) system at JCAHPC using up to 2,048 nodes of Intel Xeon Phi. Because SELL-C-σ is suitable for wide-SIMD architecture, such as Xeon Phi, improvement of the performance over the sliced ELL was more than 35% for double precision and more than 45% for single precision on OFP. Finally, accuracy verification was conducted based on the method proposed by authors for solving linear equations with sparse coefficient matrices with M-property.
基于双/单精度计算的多核集群并行多网格方法
并行多重网格法有望在超大规模超级计算机系统的科学计算中发挥重要作用,用于求解具有稀疏系数矩阵的大规模线性方程。由于求解稀疏线性系统是一个非常受内存限制的过程,因此系数矩阵的有效存储方法是一个关键问题。在之前的工作中,作者将切片ELL方法与多网格预处理(MGCG)并行共轭梯度求解器应用于非均质多孔介质三维地下水流动(pGW3D-LVM),并在大规模多核/多核集群上取得了优异的性能。在本工作中,作者将SELL-C-σ双/单精度计算引入到MGCG求解器中,并在JCAHPC的Oakforest-PACS (OLP)系统上使用多达2,048个Intel Xeon Phi节点,使用OpenMP/MPI混合并行编程模型评估了求解器的性能。由于SELL-C-σ适用于宽simd架构,例如Xeon Phi,因此在OFP上,双精度的性能比切片ELL提高35%以上,单精度的性能提高45%以上。最后,对本文提出的求解m -性质稀疏系数矩阵线性方程的方法进行了精度验证。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信