Mixed-precision block gram Schmidt orthogonalization

I. Yamazaki, S. Tomov, J. Kurzak, J. Dongarra, J. Barlow
{"title":"Mixed-precision block gram Schmidt orthogonalization","authors":"I. Yamazaki, S. Tomov, J. Kurzak, J. Dongarra, J. Barlow","doi":"10.1145/2832080.2832082","DOIUrl":null,"url":null,"abstract":"The mixed-precision Cholesky QR (CholQR) can orthogonalize the columns of a dense matrix with the minimum communication cost. Moreover, its orthogonality error depends only linearly to the condition number of the input matrix. However, when the desired higher-precision is not supported by the hardware, the software-emulated arithmetics are needed, which could significantly increase its computational cost. When there are a large number of columns to be orthogonalized, this computational overhead can have a dramatic impact on the orthogonalization time, and the mixed-precision CholQR can be much slower than the standard CholQR. In this paper, we examine several block variants of the algorithm, which reduce the computational overhead associated with the software-emulated arithmetics, while maintaining the same orthogonality error bound as the mixed-precision CholQR. Our numerical and performance results on multicore CPUs with a GPU, as well as a hybrid CPU/GPU cluster, demonstrate that compared to the mixed-precision CholQR, such a block variant can obtain speedups of up to 7.1× while maintaining about the same order of the numerical errors.","PeriodicalId":259517,"journal":{"name":"ACM SIGPLAN Symposium on Scala","volume":"3 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM SIGPLAN Symposium on Scala","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2832080.2832082","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

The mixed-precision Cholesky QR (CholQR) can orthogonalize the columns of a dense matrix with the minimum communication cost. Moreover, its orthogonality error depends only linearly to the condition number of the input matrix. However, when the desired higher-precision is not supported by the hardware, the software-emulated arithmetics are needed, which could significantly increase its computational cost. When there are a large number of columns to be orthogonalized, this computational overhead can have a dramatic impact on the orthogonalization time, and the mixed-precision CholQR can be much slower than the standard CholQR. In this paper, we examine several block variants of the algorithm, which reduce the computational overhead associated with the software-emulated arithmetics, while maintaining the same orthogonality error bound as the mixed-precision CholQR. Our numerical and performance results on multicore CPUs with a GPU, as well as a hybrid CPU/GPU cluster, demonstrate that compared to the mixed-precision CholQR, such a block variant can obtain speedups of up to 7.1× while maintaining about the same order of the numerical errors.
混合精度块克施密特正交化
混合精度乔列斯基QR (CholQR)能够以最小的通信代价对密集矩阵的列进行正交。此外,其正交性误差仅与输入矩阵的条件数线性相关。然而,当硬件不支持所需的更高精度时,就需要采用软件仿真算法,这将大大增加计算成本。当有大量列需要正交化时,这种计算开销会对正交化时间产生巨大影响,并且混合精度的CholQR可能比标准的CholQR慢得多。在本文中,我们研究了该算法的几个块变体,它们减少了与软件仿真算法相关的计算开销,同时保持与混合精度CholQR相同的正交性误差界。我们在带有GPU的多核CPU以及混合CPU/GPU集群上的数值和性能结果表明,与混合精度的CholQR相比,这种块变体可以在保持数值误差大致相同的顺序的情况下获得高达7.1倍的加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信