{"title":"gpu上混合精度$s$步共轭梯度残差替换","authors":"I. Yamazaki, E. Carson, Brian Kelley","doi":"10.1109/ipdps53621.2022.00091","DOIUrl":null,"url":null,"abstract":"The $s$-step Conjugate Gradient (CG) algorithm has the potential to reduce the communication cost of standard CG by a factor of $s$. However, though mathematically equivalent, $s$-step CG may be numerically less stable compared to standard CG in finite precision, exhibiting slower convergence and decreased attainable accuracy. This limits the use of $s$-step CG in practice. To improve the numerical behavior of $s$-step CG and overcome this potential limitation, we incorporate two techniques. First, we improve convergence behavior through the use of higher precision at critical parts of the $s$-step iteration and second, we integrate a residual replacement strategy into the resulting mixed precision $s$-step CG to improve attainable accuracy. Our experimental results on the Summit Supercomputer demonstrate that when the higher precision is implemented in hardware, these techniques have virtually no overhead on the iteration time while improving both the convergence rate and the attainable accuracy of $s$-step CG. Even when the higher precision is implemented in software, these techniques may still reduce the time-to-solution (speedups of up to $1.8\\times$ in our experiments), especially when $s$-step CG suffers from numerical instability with a small step size and the latency cost becomes a significant part of its iteration time.","PeriodicalId":321801,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"72 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Mixed Precision $s$-step Conjugate Gradient with Residual Replacement on GPUs\",\"authors\":\"I. Yamazaki, E. Carson, Brian Kelley\",\"doi\":\"10.1109/ipdps53621.2022.00091\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The $s$-step Conjugate Gradient (CG) algorithm has the potential to reduce the communication cost of standard CG by a factor of $s$. However, though mathematically equivalent, $s$-step CG may be numerically less stable compared to standard CG in finite precision, exhibiting slower convergence and decreased attainable accuracy. This limits the use of $s$-step CG in practice. To improve the numerical behavior of $s$-step CG and overcome this potential limitation, we incorporate two techniques. First, we improve convergence behavior through the use of higher precision at critical parts of the $s$-step iteration and second, we integrate a residual replacement strategy into the resulting mixed precision $s$-step CG to improve attainable accuracy. Our experimental results on the Summit Supercomputer demonstrate that when the higher precision is implemented in hardware, these techniques have virtually no overhead on the iteration time while improving both the convergence rate and the attainable accuracy of $s$-step CG. Even when the higher precision is implemented in software, these techniques may still reduce the time-to-solution (speedups of up to $1.8\\\\times$ in our experiments), especially when $s$-step CG suffers from numerical instability with a small step size and the latency cost becomes a significant part of its iteration time.\",\"PeriodicalId\":321801,\"journal\":{\"name\":\"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"volume\":\"72 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ipdps53621.2022.00091\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ipdps53621.2022.00091","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Mixed Precision $s$-step Conjugate Gradient with Residual Replacement on GPUs
The $s$-step Conjugate Gradient (CG) algorithm has the potential to reduce the communication cost of standard CG by a factor of $s$. However, though mathematically equivalent, $s$-step CG may be numerically less stable compared to standard CG in finite precision, exhibiting slower convergence and decreased attainable accuracy. This limits the use of $s$-step CG in practice. To improve the numerical behavior of $s$-step CG and overcome this potential limitation, we incorporate two techniques. First, we improve convergence behavior through the use of higher precision at critical parts of the $s$-step iteration and second, we integrate a residual replacement strategy into the resulting mixed precision $s$-step CG to improve attainable accuracy. Our experimental results on the Summit Supercomputer demonstrate that when the higher precision is implemented in hardware, these techniques have virtually no overhead on the iteration time while improving both the convergence rate and the attainable accuracy of $s$-step CG. Even when the higher precision is implemented in software, these techniques may still reduce the time-to-solution (speedups of up to $1.8\times$ in our experiments), especially when $s$-step CG suffers from numerical instability with a small step size and the latency cost becomes a significant part of its iteration time.