基于FP16/BF16 Jacobi预调节器的混合精度Krylov解算器数据转换新方法

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region Pub Date : 2023-02-27 DOI:10.1145/3578178.3578222

Takuya Ina, Y. Idomura, Toshiyuki Imamura, Naoyuki Onodera

{"title":"基于FP16/BF16 Jacobi预调节器的混合精度Krylov解算器数据转换新方法","authors":"Takuya Ina, Y. Idomura, Toshiyuki Imamura, Naoyuki Onodera","doi":"10.1145/3578178.3578222","DOIUrl":null,"url":null,"abstract":"Mixed precision Krylov solvers with the Jacobi preconditioner often show significant convergence degradation when the Jacobi preconditioner is computed in low precision such as FP16 and BF16. It is found that this convergence degradation is attributed to loss of diagonal dominance due to roundoff errors in data conversion. To resolve this issue, we propose a new data conversion method, which is designed to keep diagonal dominance of the original matrix data. The proposed method is tested by computing the Poisson equation using the conjugate gradient method, the general minimum residual method, and the biconjugate gradient stabilized method with the FP16/BF16 Jacobi preconditioner on NVIDIA V100 GPUs. Here, the new data conversion is implemented by switching the round-nearest, round-up, round-down, and round-towards-zero intrinsics in CUDA, and is called once before the main iteration. Therefore, the cost of the new data conversion is negligible. When the coefficients of matrix is continuously changed by scaling the linear system, the conventional data conversion based on the round-nearest intrinsic shows periodic changes of the convergence property depending on the difference of the roundoff errors between diagonal and off-diagonal coefficients. Here, the period and magnitude of the convergence degradation depend on the bit length of significand. On the other hand, the proposed data conversion method is shown to fully avoid the convergence degradation, and robust mixed precision computing is enabled for the Jacobi preconditioner without extra overheads.","PeriodicalId":314778,"journal":{"name":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","volume":"84 10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A new data conversion method for mixed precision Krylov solvers with FP16/BF16 Jacobi preconditioners\",\"authors\":\"Takuya Ina, Y. Idomura, Toshiyuki Imamura, Naoyuki Onodera\",\"doi\":\"10.1145/3578178.3578222\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mixed precision Krylov solvers with the Jacobi preconditioner often show significant convergence degradation when the Jacobi preconditioner is computed in low precision such as FP16 and BF16. It is found that this convergence degradation is attributed to loss of diagonal dominance due to roundoff errors in data conversion. To resolve this issue, we propose a new data conversion method, which is designed to keep diagonal dominance of the original matrix data. The proposed method is tested by computing the Poisson equation using the conjugate gradient method, the general minimum residual method, and the biconjugate gradient stabilized method with the FP16/BF16 Jacobi preconditioner on NVIDIA V100 GPUs. Here, the new data conversion is implemented by switching the round-nearest, round-up, round-down, and round-towards-zero intrinsics in CUDA, and is called once before the main iteration. Therefore, the cost of the new data conversion is negligible. When the coefficients of matrix is continuously changed by scaling the linear system, the conventional data conversion based on the round-nearest intrinsic shows periodic changes of the convergence property depending on the difference of the roundoff errors between diagonal and off-diagonal coefficients. Here, the period and magnitude of the convergence degradation depend on the bit length of significand. On the other hand, the proposed data conversion method is shown to fully avoid the convergence degradation, and robust mixed precision computing is enabled for the Jacobi preconditioner without extra overheads.\",\"PeriodicalId\":314778,\"journal\":{\"name\":\"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region\",\"volume\":\"84 10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-02-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3578178.3578222\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3578178.3578222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

采用Jacobi预调节器的混合精度Krylov解在FP16和BF16等低精度条件下计算Jacobi预调节器时，往往会出现明显的收敛性退化。发现这种收敛性退化是由于数据转换中的舍入误差导致对角优势的丧失。为了解决这一问题，我们提出了一种新的数据转换方法，该方法旨在保持原始矩阵数据的对角优势。通过在NVIDIA V100 gpu上使用FP16/BF16 Jacobi预调节器计算泊松方程、一般最小残差法和双共轭梯度稳定法对所提方法进行了验证。在这里，新的数据转换是通过切换CUDA中最接近四舍五入、向上四舍五入和向零四舍五入的内在特性来实现的，并且在主迭代之前被调用一次。因此，新数据转换的成本可以忽略不计。当矩阵的系数通过线性系统的缩放而连续变化时，基于最接近四舍五入的传统数据转换根据对角系数和非对角系数的舍入误差的差异呈现出收敛性的周期性变化。这里，收敛退化的周期和幅度取决于有效位长度。另一方面，所提出的数据转换方法完全避免了收敛退化，并且在没有额外开销的情况下实现了Jacobi预调节器的鲁棒混合精度计算。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A new data conversion method for mixed precision Krylov solvers with FP16/BF16 Jacobi preconditioners

Mixed precision Krylov solvers with the Jacobi preconditioner often show significant convergence degradation when the Jacobi preconditioner is computed in low precision such as FP16 and BF16. It is found that this convergence degradation is attributed to loss of diagonal dominance due to roundoff errors in data conversion. To resolve this issue, we propose a new data conversion method, which is designed to keep diagonal dominance of the original matrix data. The proposed method is tested by computing the Poisson equation using the conjugate gradient method, the general minimum residual method, and the biconjugate gradient stabilized method with the FP16/BF16 Jacobi preconditioner on NVIDIA V100 GPUs. Here, the new data conversion is implemented by switching the round-nearest, round-up, round-down, and round-towards-zero intrinsics in CUDA, and is called once before the main iteration. Therefore, the cost of the new data conversion is negligible. When the coefficients of matrix is continuously changed by scaling the linear system, the conventional data conversion based on the round-nearest intrinsic shows periodic changes of the convergence property depending on the difference of the roundoff errors between diagonal and off-diagonal coefficients. Here, the period and magnitude of the convergence degradation depend on the bit length of significand. On the other hand, the proposed data conversion method is shown to fully avoid the convergence degradation, and robust mixed precision computing is enabled for the Jacobi preconditioner without extra overheads.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region

自引率

0.00%

发文量