提高通信回避GMRES收敛性的通缩策略

2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems Pub Date : 2014-11-16 DOI:10.1109/ScalA.2014.6

I. Yamazaki, S. Tomov, J. Dongarra

{"title":"提高通信回避GMRES收敛性的通缩策略","authors":"I. Yamazaki, S. Tomov, J. Dongarra","doi":"10.1109/ScalA.2014.6","DOIUrl":null,"url":null,"abstract":"The generalized minimum residual (GMRES) method is a popular method for solving a large-scale sparse nonsymmetric linear system of equations. On modern computers, especially on a large-scale system, the communication is becoming increasingly expensive. To address this hardware trend, a communication-avoiding variant of GMRES (CA-GMRES) has become attractive, frequently showing its superior performance over GMRES on various hardware architectures. In practice, to mitigate the increasing costs of explicitly orthogonalizing the projection basis vectors, the iterations of both GMRES and CAGMRES are restarted, which often slows down the solution convergence. To avoid this slowdown and improve the performance of restarted CA-GMRES, in this paper, we study the effectiveness of deflation strategies. Our studies are based on a thick restarted variant of CA-GMRES, which can implicitly deflate a few Ritz vectors, that approximately span an eigenspace of the coefficient matrix, through the standard orthogonalization process. This strategy is mathematically equivalent to the standard thick-restarted GMRES, and it requires only a small computational overhead and does not increase the communication or storage costs of CA-GMRES. Hence, by avoiding the communication, this deflated version of CA-GMRES obtains the same performance benefits over the deflated version of GMRES as the standard CA-GMRES does over GMRES. Our experimental results on a hybrid CPU/GPU cluster demonstrate that thick-restart can significantly improve the convergence and reduce the solution time of CA-GMRES. We also show that this deflation strategy can be combined with a local domain decomposition based preconditioner to further enhance the robustness of CA-GMRES, making it more attractive in practice.","PeriodicalId":323689,"journal":{"name":"2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-11-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES\",\"authors\":\"I. Yamazaki, S. Tomov, J. Dongarra\",\"doi\":\"10.1109/ScalA.2014.6\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The generalized minimum residual (GMRES) method is a popular method for solving a large-scale sparse nonsymmetric linear system of equations. On modern computers, especially on a large-scale system, the communication is becoming increasingly expensive. To address this hardware trend, a communication-avoiding variant of GMRES (CA-GMRES) has become attractive, frequently showing its superior performance over GMRES on various hardware architectures. In practice, to mitigate the increasing costs of explicitly orthogonalizing the projection basis vectors, the iterations of both GMRES and CAGMRES are restarted, which often slows down the solution convergence. To avoid this slowdown and improve the performance of restarted CA-GMRES, in this paper, we study the effectiveness of deflation strategies. Our studies are based on a thick restarted variant of CA-GMRES, which can implicitly deflate a few Ritz vectors, that approximately span an eigenspace of the coefficient matrix, through the standard orthogonalization process. This strategy is mathematically equivalent to the standard thick-restarted GMRES, and it requires only a small computational overhead and does not increase the communication or storage costs of CA-GMRES. Hence, by avoiding the communication, this deflated version of CA-GMRES obtains the same performance benefits over the deflated version of GMRES as the standard CA-GMRES does over GMRES. Our experimental results on a hybrid CPU/GPU cluster demonstrate that thick-restart can significantly improve the convergence and reduce the solution time of CA-GMRES. We also show that this deflation strategy can be combined with a local domain decomposition based preconditioner to further enhance the robustness of CA-GMRES, making it more attractive in practice.\",\"PeriodicalId\":323689,\"journal\":{\"name\":\"2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems\",\"volume\":\"17 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-11-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ScalA.2014.6\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ScalA.2014.6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

广义最小残差法(GMRES)是求解大规模稀疏非对称线性方程组的常用方法。在现代计算机上，特别是在大型系统上，通信变得越来越昂贵。为了解决这一硬件趋势，GMRES的通信避免变体(CA-GMRES)变得很有吸引力，在各种硬件架构上经常显示其优于GMRES的性能。在实践中，为了减轻投影基向量显式正交化的成本增加，GMRES和CAGMRES的迭代都要重新开始，这往往会减慢解的收敛速度。为了避免这种减速并提高重启后的CA-GMRES的性能，本文研究了紧缩策略的有效性。我们的研究是基于CA-GMRES的一个厚重启变体，它可以隐式地压缩几个里兹向量，这些里兹向量通过标准正交化过程近似地跨越系数矩阵的特征空间。该策略在数学上等同于标准的厚重启GMRES，它只需要很小的计算开销，并且不会增加CA-GMRES的通信或存储成本。因此，通过避免通信，这个压缩版本的CA-GMRES获得了与压缩版本的GMRES相同的性能优势，就像标准CA-GMRES优于GMRES一样。我们在CPU/GPU混合集群上的实验结果表明，厚重启可以显著提高CA-GMRES的收敛性并缩短求解时间。我们还表明，这种紧缩策略可以与基于局部域分解的预条件相结合，进一步增强CA-GMRES的鲁棒性，使其在实践中更具吸引力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Deflation Strategies to Improve the Convergence of Communication-Avoiding GMRES

The generalized minimum residual (GMRES) method is a popular method for solving a large-scale sparse nonsymmetric linear system of equations. On modern computers, especially on a large-scale system, the communication is becoming increasingly expensive. To address this hardware trend, a communication-avoiding variant of GMRES (CA-GMRES) has become attractive, frequently showing its superior performance over GMRES on various hardware architectures. In practice, to mitigate the increasing costs of explicitly orthogonalizing the projection basis vectors, the iterations of both GMRES and CAGMRES are restarted, which often slows down the solution convergence. To avoid this slowdown and improve the performance of restarted CA-GMRES, in this paper, we study the effectiveness of deflation strategies. Our studies are based on a thick restarted variant of CA-GMRES, which can implicitly deflate a few Ritz vectors, that approximately span an eigenspace of the coefficient matrix, through the standard orthogonalization process. This strategy is mathematically equivalent to the standard thick-restarted GMRES, and it requires only a small computational overhead and does not increase the communication or storage costs of CA-GMRES. Hence, by avoiding the communication, this deflated version of CA-GMRES obtains the same performance benefits over the deflated version of GMRES as the standard CA-GMRES does over GMRES. Our experimental results on a hybrid CPU/GPU cluster demonstrate that thick-restart can significantly improve the convergence and reduce the solution time of CA-GMRES. We also show that this deflation strategy can be combined with a local domain decomposition based preconditioner to further enhance the robustness of CA-GMRES, making it more attractive in practice.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 5th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems

自引率

0.00%

发文量