Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis Pub Date : 2014-11-16 DOI:10.1109/SC.2014.81

I. Yamazaki, S. Rajamanickam, E. Boman, M. Hoemmen, M. Heroux, S. Tomov

{"title":"Domain Decomposition Preconditioners for Communication-Avoiding Krylov Methods on a Hybrid CPU/GPU Cluster","authors":"I. Yamazaki, S. Rajamanickam, E. Boman, M. Hoemmen, M. Heroux, S. Tomov","doi":"10.1109/SC.2014.81","DOIUrl":null,"url":null,"abstract":"Krylov subspace projection methods are widely used iterative methods for solving large-scale linear systems of equations. Researchers have demonstrated that communication avoiding (CA) techniques can improve Krylov methods' performance on modern computers, where communication is becoming increasingly expensive compared to arithmetic operations. In this paper, we extend these studies by two major contributions. First, we present our implementation of a CA variant of the Generalized Minimum Residual (GMRES) method, called CAGMRES, for solving no symmetric linear systems of equations on a hybrid CPU/GPU cluster. Our performance results on up to 120 GPUs show that CA-GMRES gives a speedup of up to 2.5x in total solution time over standard GMRES on a hybrid cluster with twelve Intel Xeon CPUs and three Nvidia Fermi GPUs on each node. We then outline a domain decomposition framework to introduce a family of preconditioners that are suitable for CA Krylov methods. Our preconditioners do not incur any additional communication and allow the easy reuse of existing algorithms and software for the sub domain solves. Experimental results on the hybrid CPU/GPU cluster demonstrate that CA-GMRES with preconditioning achieve a speedup of up to 7.4x over CAGMRES without preconditioning, and speedup of up to 1.7x over GMRES with preconditioning in total solution time. These results confirm the potential of our framework to develop a practical and effective preconditioned CA Krylov method.","PeriodicalId":275261,"journal":{"name":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"29","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SC14: International Conference for High Performance Computing, Networking, Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SC.2014.81","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 29

Abstract

Krylov subspace projection methods are widely used iterative methods for solving large-scale linear systems of equations. Researchers have demonstrated that communication avoiding (CA) techniques can improve Krylov methods' performance on modern computers, where communication is becoming increasingly expensive compared to arithmetic operations. In this paper, we extend these studies by two major contributions. First, we present our implementation of a CA variant of the Generalized Minimum Residual (GMRES) method, called CAGMRES, for solving no symmetric linear systems of equations on a hybrid CPU/GPU cluster. Our performance results on up to 120 GPUs show that CA-GMRES gives a speedup of up to 2.5x in total solution time over standard GMRES on a hybrid cluster with twelve Intel Xeon CPUs and three Nvidia Fermi GPUs on each node. We then outline a domain decomposition framework to introduce a family of preconditioners that are suitable for CA Krylov methods. Our preconditioners do not incur any additional communication and allow the easy reuse of existing algorithms and software for the sub domain solves. Experimental results on the hybrid CPU/GPU cluster demonstrate that CA-GMRES with preconditioning achieve a speedup of up to 7.4x over CAGMRES without preconditioning, and speedup of up to 1.7x over GMRES with preconditioning in total solution time. These results confirm the potential of our framework to develop a practical and effective preconditioned CA Krylov method.

查看原文本刊更多论文

CPU/GPU混合集群中避免通信的Krylov方法的域分解前置条件

Krylov子空间投影法是求解大型线性方程组的一种广泛使用的迭代方法。研究人员已经证明，通信避免(CA)技术可以提高Krylov方法在现代计算机上的性能，在现代计算机上，通信与算术运算相比变得越来越昂贵。在本文中，我们通过两个主要贡献来扩展这些研究。首先，我们提出了广义最小残差(GMRES)方法的CA变体的实现，称为CAGMRES，用于在混合CPU/GPU集群上求解非对称线性方程组。我们在多达120个gpu上的性能结果表明，在每个节点上有12个Intel Xeon cpu和3个Nvidia Fermi gpu的混合集群上，CA-GMRES的总解决方案时间比标准GMRES的加速速度提高了2.5倍。然后，我们概述了一个领域分解框架，以引入适用于CA Krylov方法的一系列前置条件。我们的预调节器不会产生任何额外的通信，并且允许为子域解决方案轻松重用现有算法和软件。在CPU/GPU混合集群上的实验结果表明，预处理后的CA-GMRES在总求解时间上比未预处理的CAGMRES加速高达7.4倍，比预处理后的GMRES加速高达1.7倍。这些结果证实了我们的框架开发实用有效的预处理CA Krylov方法的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

SC14: International Conference for High Performance Computing, Networking, Storage and Analysis

自引率

0.00%

发文量