gpu上混合精度$s$步共轭梯度残差替换

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS) Pub Date : 2022-05-01 DOI:10.1109/ipdps53621.2022.00091

I. Yamazaki, E. Carson, Brian Kelley

{"title":"gpu上混合精度$s$步共轭梯度残差替换","authors":"I. Yamazaki, E. Carson, Brian Kelley","doi":"10.1109/ipdps53621.2022.00091","DOIUrl":null,"url":null,"abstract":"The $s$-step Conjugate Gradient (CG) algorithm has the potential to reduce the communication cost of standard CG by a factor of $s$. However, though mathematically equivalent, $s$-step CG may be numerically less stable compared to standard CG in finite precision, exhibiting slower convergence and decreased attainable accuracy. This limits the use of $s$-step CG in practice. To improve the numerical behavior of $s$-step CG and overcome this potential limitation, we incorporate two techniques. First, we improve convergence behavior through the use of higher precision at critical parts of the $s$-step iteration and second, we integrate a residual replacement strategy into the resulting mixed precision $s$-step CG to improve attainable accuracy. Our experimental results on the Summit Supercomputer demonstrate that when the higher precision is implemented in hardware, these techniques have virtually no overhead on the iteration time while improving both the convergence rate and the attainable accuracy of $s$-step CG. Even when the higher precision is implemented in software, these techniques may still reduce the time-to-solution (speedups of up to $1.8\\times$ in our experiments), especially when $s$-step CG suffers from numerical instability with a small step size and the latency cost becomes a significant part of its iteration time.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Mixed Precision $s$-step Conjugate Gradient with Residual Replacement on GPUs\",\"authors\":\"I. Yamazaki, E. Carson, Brian Kelley\",\"doi\":\"10.1109/ipdps53621.2022.00091\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The $s$-step Conjugate Gradient (CG) algorithm has the potential to reduce the communication cost of standard CG by a factor of $s$. However, though mathematically equivalent, $s$-step CG may be numerically less stable compared to standard CG in finite precision, exhibiting slower convergence and decreased attainable accuracy. This limits the use of $s$-step CG in practice. To improve the numerical behavior of $s$-step CG and overcome this potential limitation, we incorporate two techniques. First, we improve convergence behavior through the use of higher precision at critical parts of the $s$-step iteration and second, we integrate a residual replacement strategy into the resulting mixed precision $s$-step CG to improve attainable accuracy. Our experimental results on the Summit Supercomputer demonstrate that when the higher precision is implemented in hardware, these techniques have virtually no overhead on the iteration time while improving both the convergence rate and the attainable accuracy of $s$-step CG. Even when the higher precision is implemented in software, these techniques may still reduce the time-to-solution (speedups of up to $1.8\\\\times$ in our experiments), especially when $s$-step CG suffers from numerical instability with a small step size and the latency cost becomes a significant part of its iteration time.\",\"PeriodicalId\":321801,\"journal\":{\"name\":\"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ipdps53621.2022.00091\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps53621.2022.00091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

$ 5 $步共轭梯度(CG)算法具有将标准CG的通信成本降低$ 5 $的潜力。然而，虽然在数学上是等价的，但在有限精度下，$s$-step CG在数值上可能不如标准CG稳定，表现出较慢的收敛速度和较低的可达到精度。这在实践中限制了$ 5 $-step CG的使用。为了改善$ 5 $ $-step CG的数值行为并克服这种潜在的限制，我们采用了两种技术。首先，我们通过在5步迭代的关键部分使用更高的精度来改善收敛行为;其次，我们将残差替换策略集成到得到的5步混合精度CG中，以提高可达到的精度。我们在Summit超级计算机上的实验结果表明，当在硬件上实现更高的精度时，这些技术在迭代时间上几乎没有开销，同时提高了收敛速度和5步CG的可实现精度。即使软件实现的更高的精度,这些技术可能仍然减少time-to-solution(美元1.8美元\倍的加速效果在我们的实验),尤其是当年代一步一步CG患有数值不稳定美元与一小步大小和延迟成本成为其迭代时间的重要组成部分。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Mixed Precision $s$-step Conjugate Gradient with Residual Replacement on GPUs

The $s$-step Conjugate Gradient (CG) algorithm has the potential to reduce the communication cost of standard CG by a factor of $s$. However, though mathematically equivalent, $s$-step CG may be numerically less stable compared to standard CG in finite precision, exhibiting slower convergence and decreased attainable accuracy. This limits the use of $s$-step CG in practice. To improve the numerical behavior of $s$-step CG and overcome this potential limitation, we incorporate two techniques. First, we improve convergence behavior through the use of higher precision at critical parts of the $s$-step iteration and second, we integrate a residual replacement strategy into the resulting mixed precision $s$-step CG to improve attainable accuracy. Our experimental results on the Summit Supercomputer demonstrate that when the higher precision is implemented in hardware, these techniques have virtually no overhead on the iteration time while improving both the convergence rate and the attainable accuracy of $s$-step CG. Even when the higher precision is implemented in software, these techniques may still reduce the time-to-solution (speedups of up to $1.8\times$ in our experiments), especially when $s$-step CG suffers from numerical instability with a small step size and the latency cost becomes a significant part of its iteration time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

自引率

0.00%

发文量