s-Step Krylov Subspace Methods as Bottom Solvers for Geometric Multigrid

2014 IEEE 28th International Parallel and Distributed Processing Symposium Pub Date : 2014-05-19 DOI:10.1109/IPDPS.2014.119

Samuel Williams, M. Lijewski, A. Almgren, B. V. Straalen, E. Carson, Nicholas Knight, J. Demmel

{"title":"s-Step Krylov Subspace Methods as Bottom Solvers for Geometric Multigrid","authors":"Samuel Williams, M. Lijewski, A. Almgren, B. V. Straalen, E. Carson, Nicholas Knight, J. Demmel","doi":"10.1109/IPDPS.2014.119","DOIUrl":null,"url":null,"abstract":"Geometric multigrid solvers within adaptive mesh refinement (AMR) applications often reach a point where further coarsening of the grid becomes impractical as individual sub domain sizes approach unity. At this point the most common solution is to use a bottom solver, such as BiCGStab, to reduce the residual by a fixed factor at the coarsest level. Each iteration of BiCGStab requires multiple global reductions (MPI collectives). As the number of BiCGStab iterations required for convergence grows with problem size, and the time for each collective operation increases with machine scale, bottom solves in large-scale applications can constitute a significant fraction of the overall multigrid solve time. In this paper, we implement, evaluate, and optimize a communication-avoiding s-step formulation of BiCGStab (CABiCGStab for short) as a high-performance, distributed-memory bottom solver for geometric multigrid solvers. This is the first time s-step Krylov subspace methods have been leveraged to improve multigrid bottom solver performance. We use a synthetic benchmark for detailed analysis and integrate the best implementation into BoxLib in order to evaluate the benefit of a s-step Krylov subspace method on the multigrid solves found in the applications LMC and Nyx on up to 32,768 cores on the Cray XE6 at NERSC. Overall, we see bottom solver improvements of up to 4.2x on synthetic problems and up to 2.7x in real applications. This results in as much as a 1.5x improvement in solver performance in real applications.","PeriodicalId":309291,"journal":{"name":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 28th International Parallel and Distributed Processing Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2014.119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

Abstract

Geometric multigrid solvers within adaptive mesh refinement (AMR) applications often reach a point where further coarsening of the grid becomes impractical as individual sub domain sizes approach unity. At this point the most common solution is to use a bottom solver, such as BiCGStab, to reduce the residual by a fixed factor at the coarsest level. Each iteration of BiCGStab requires multiple global reductions (MPI collectives). As the number of BiCGStab iterations required for convergence grows with problem size, and the time for each collective operation increases with machine scale, bottom solves in large-scale applications can constitute a significant fraction of the overall multigrid solve time. In this paper, we implement, evaluate, and optimize a communication-avoiding s-step formulation of BiCGStab (CABiCGStab for short) as a high-performance, distributed-memory bottom solver for geometric multigrid solvers. This is the first time s-step Krylov subspace methods have been leveraged to improve multigrid bottom solver performance. We use a synthetic benchmark for detailed analysis and integrate the best implementation into BoxLib in order to evaluate the benefit of a s-step Krylov subspace method on the multigrid solves found in the applications LMC and Nyx on up to 32,768 cores on the Cray XE6 at NERSC. Overall, we see bottom solver improvements of up to 4.2x on synthetic problems and up to 2.7x in real applications. This results in as much as a 1.5x improvement in solver performance in real applications.

查看原文本刊更多论文

几何多重网格的s步Krylov子空间底解方法

在自适应网格细化(AMR)应用中，几何多网格求解器往往会遇到这样的情况:随着各个子域的尺寸趋于统一，进一步的网格粗化变得不切实际。在这一点上，最常见的解决方案是使用底部求解器，例如BiCGStab，在最粗糙的级别上通过固定因子减少残差。BiCGStab的每次迭代都需要多个全局缩减(MPI集合)。由于收敛所需的BiCGStab迭代次数随着问题规模的增加而增加，并且每个集合操作的时间随着机器规模的增加而增加，因此大规模应用程序中的底部求解可能占整个多网格求解时间的很大一部分。在本文中，我们实现，评估和优化了BiCGStab(简称CABiCGStab)的通信避免s步公式，作为几何多网格求解器的高性能，分布式内存底部求解器。这是第一次利用s步Krylov子空间方法来提高多网格底部求解器的性能。我们使用综合基准进行详细分析，并将最佳实现集成到BoxLib中，以评估s步Krylov子空间方法在多网格解决方案上的优势，这些解决方案在NERSC的Cray XE6上高达32,768个内核的LMC和Nyx应用程序中找到。总的来说，我们看到底部求解器在综合问题上的改进高达4.2倍，在实际应用中高达2.7倍。这使得求解器在实际应用中的性能提高了1.5倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE 28th International Parallel and Distributed Processing Symposium

自引率

0.00%

发文量