Parallel multiple precision division by a single precision divisor

Niall Emmart, C. Weems
{"title":"Parallel multiple precision division by a single precision divisor","authors":"Niall Emmart, C. Weems","doi":"10.1109/HiPC.2011.6152712","DOIUrl":null,"url":null,"abstract":"We report an algorithm for division of a multi-precision integer by a single-precision value using a graphics processing unit (GPU). Our algorithm combines a parallel version of Jebelean's exact division algorithm with a left-to-right algorithm for computing the borrow chain, to relax the requirement of exactness. We also employ Takahashi's recently reported cyclic reduction technique [10] for GPU division to further enhance performance. The result is that our algorithm is asymptotically faster, at O(n/p + log p), than Takahashi's algorithm at O(n/p log p). We report results for dividends with precisions of 1024, 2048, and 4096 bits running on an NVIDIA GTX 480, and show that, for non-constant divisors, our algorithm is 20% slower at 1024 bits (due to startup overhead), by 2048 we are 40% faster, and at 4096 bits we are able to run 2.5 times faster. For division by constants, with precomputed tables, our algorithm is faster at all sizes with a speedup ranging from 2.3 to 6 times faster.","PeriodicalId":122468,"journal":{"name":"2011 18th International Conference on High Performance Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 18th International Conference on High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2011.6152712","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

We report an algorithm for division of a multi-precision integer by a single-precision value using a graphics processing unit (GPU). Our algorithm combines a parallel version of Jebelean's exact division algorithm with a left-to-right algorithm for computing the borrow chain, to relax the requirement of exactness. We also employ Takahashi's recently reported cyclic reduction technique [10] for GPU division to further enhance performance. The result is that our algorithm is asymptotically faster, at O(n/p + log p), than Takahashi's algorithm at O(n/p log p). We report results for dividends with precisions of 1024, 2048, and 4096 bits running on an NVIDIA GTX 480, and show that, for non-constant divisors, our algorithm is 20% slower at 1024 bits (due to startup overhead), by 2048 we are 40% faster, and at 4096 bits we are able to run 2.5 times faster. For division by constants, with precomputed tables, our algorithm is faster at all sizes with a speedup ranging from 2.3 to 6 times faster.
用单个精确除数并行多个精确除法
我们报告了一种使用图形处理单元(GPU)将多精度整数除以单精度值的算法。我们的算法结合了Jebelean精确除法的并行版本和从左到右的借阅链计算算法,从而放宽了对准确性的要求。我们还采用Takahashi最近报道的循环缩减技术[10]用于GPU划分,以进一步提高性能。结果是,我们的算法在O(n/p + log p)时比高桥的算法在O(n/p log p)时渐近更快。我们报告了在NVIDIA GTX 480上运行的精度为1024、2048和4096位的股息的结果,并表明,对于非常数除数,我们的算法在1024位时慢20%(由于启动开销),到2048我们快40%,在4096位时我们能够运行快2.5倍。对于常数除法,使用预先计算的表,我们的算法在所有大小下都更快,加速幅度从2.3到6倍不等。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信