银杏线性代数软件中高性能预处理的自适应精确块-雅可比

ACM Transactions on Mathematical Software (TOMS) Pub Date : 2021-04-26 DOI:10.1145/3441850

Goran Flegar, H. Anzt, T. Cojean, E. S. Quintana‐Ortí

{"title":"银杏线性代数软件中高性能预处理的自适应精确块-雅可比","authors":"Goran Flegar, H. Anzt, T. Cojean, E. S. Quintana‐Ortí","doi":"10.1145/3441850","DOIUrl":null,"url":null,"abstract":"The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.","PeriodicalId":7036,"journal":{"name":"ACM Transactions on Mathematical Software (TOMS)","volume":"39 1","pages":"1 - 28"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":"{\"title\":\"Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software\",\"authors\":\"Goran Flegar, H. Anzt, T. Cojean, E. S. Quintana‐Ortí\",\"doi\":\"10.1145/3441850\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.\",\"PeriodicalId\":7036,\"journal\":{\"name\":\"ACM Transactions on Mathematical Software (TOMS)\",\"volume\":\"39 1\",\"pages\":\"1 - 28\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"10\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Mathematical Software (TOMS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3441850\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Mathematical Software (TOMS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3441850","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

摘要

在数值算法中使用混合精度是加速科学应用的一种有前途的策略。特别是，在高端gpu(图形处理单元)中采用专门的硬件和数据格式来进行低精度算法，这激发了许多旨在仔细降低工作精度以加快计算速度的努力。对于性能受内存带宽限制的算法，在内存访问之前(和之后)压缩其数据的想法受到了相当大的关注。一种想法是将近似运算符(类似于预处理符)存储在低于工作精度的位置，希望不会影响算法输出。我们实现了自适应精度块- jacobi预调节器的第一个高性能实现，它选择用于存储预调节器数据的精度格式，同时考虑到单个预调节器块的数值属性。我们在Ginkgo线性代数库中实现了自适应块jacobi预调节器作为生产就绪功能，不仅考虑了IEEE标准的精确格式，还考虑了优化指数长度和显著预调节器块特性的自定义格式。在最先进的GPU加速器上运行的实验表明，我们的实现提供了有吸引力的运行时节省。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software

The use of mixed precision in numerical algorithms is a promising strategy for accelerating scientific applications. In particular, the adoption of specialized hardware and data formats for low-precision arithmetic in high-end GPUs (graphics processing units) has motivated numerous efforts aiming at carefully reducing the working precision in order to speed up the computations. For algorithms whose performance is bound by the memory bandwidth, the idea of compressing its data before (and after) memory accesses has received considerable attention. One idea is to store an approximate operator–like a preconditioner–in lower than working precision hopefully without impacting the algorithm output. We realize the first high-performance implementation of an adaptive precision block-Jacobi preconditioner which selects the precision format used to store the preconditioner data on-the-fly, taking into account the numerical properties of the individual preconditioner blocks. We implement the adaptive block-Jacobi preconditioner as production-ready functionality in the Ginkgo linear algebra library, considering not only the precision formats that are part of the IEEE standard, but also customized formats which optimize the length of the exponent and significand to the characteristics of the preconditioner blocks. Experiments run on a state-of-the-art GPU accelerator show that our implementation offers attractive runtime savings.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Mathematical Software (TOMS)

自引率

0.00%

发文量