Mixed Precision $s$-step Conjugate Gradient with Residual Replacement on GPUs

I. Yamazaki, E. Carson, Brian Kelley
{"title":"Mixed Precision $s$-step Conjugate Gradient with Residual Replacement on GPUs","authors":"I. Yamazaki, E. Carson, Brian Kelley","doi":"10.1109/ipdps53621.2022.00091","DOIUrl":null,"url":null,"abstract":"The $s$-step Conjugate Gradient (CG) algorithm has the potential to reduce the communication cost of standard CG by a factor of $s$. However, though mathematically equivalent, $s$-step CG may be numerically less stable compared to standard CG in finite precision, exhibiting slower convergence and decreased attainable accuracy. This limits the use of $s$-step CG in practice. To improve the numerical behavior of $s$-step CG and overcome this potential limitation, we incorporate two techniques. First, we improve convergence behavior through the use of higher precision at critical parts of the $s$-step iteration and second, we integrate a residual replacement strategy into the resulting mixed precision $s$-step CG to improve attainable accuracy. Our experimental results on the Summit Supercomputer demonstrate that when the higher precision is implemented in hardware, these techniques have virtually no overhead on the iteration time while improving both the convergence rate and the attainable accuracy of $s$-step CG. Even when the higher precision is implemented in software, these techniques may still reduce the time-to-solution (speedups of up to $1.8\\times$ in our experiments), especially when $s$-step CG suffers from numerical instability with a small step size and the latency cost becomes a significant part of its iteration time.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps53621.2022.00091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The $s$-step Conjugate Gradient (CG) algorithm has the potential to reduce the communication cost of standard CG by a factor of $s$. However, though mathematically equivalent, $s$-step CG may be numerically less stable compared to standard CG in finite precision, exhibiting slower convergence and decreased attainable accuracy. This limits the use of $s$-step CG in practice. To improve the numerical behavior of $s$-step CG and overcome this potential limitation, we incorporate two techniques. First, we improve convergence behavior through the use of higher precision at critical parts of the $s$-step iteration and second, we integrate a residual replacement strategy into the resulting mixed precision $s$-step CG to improve attainable accuracy. Our experimental results on the Summit Supercomputer demonstrate that when the higher precision is implemented in hardware, these techniques have virtually no overhead on the iteration time while improving both the convergence rate and the attainable accuracy of $s$-step CG. Even when the higher precision is implemented in software, these techniques may still reduce the time-to-solution (speedups of up to $1.8\times$ in our experiments), especially when $s$-step CG suffers from numerical instability with a small step size and the latency cost becomes a significant part of its iteration time.
gpu上混合精度$s$步共轭梯度残差替换
$ 5 $步共轭梯度(CG)算法具有将标准CG的通信成本降低$ 5 $的潜力。然而,虽然在数学上是等价的,但在有限精度下,$s$-step CG在数值上可能不如标准CG稳定,表现出较慢的收敛速度和较低的可达到精度。这在实践中限制了$ 5 $-step CG的使用。为了改善$ 5 $ $-step CG的数值行为并克服这种潜在的限制,我们采用了两种技术。首先,我们通过在5步迭代的关键部分使用更高的精度来改善收敛行为;其次,我们将残差替换策略集成到得到的5步混合精度CG中,以提高可达到的精度。我们在Summit超级计算机上的实验结果表明,当在硬件上实现更高的精度时,这些技术在迭代时间上几乎没有开销,同时提高了收敛速度和5步CG的可实现精度。即使软件实现的更高的精度,这些技术可能仍然减少time-to-solution(美元1.8美元\倍的加速效果在我们的实验),尤其是当年代一步一步CG患有数值不稳定美元与一小步大小和延迟成本成为其迭代时间的重要组成部分。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信