{"title":"Parallel multiple precision division by a single precision divisor","authors":"Niall Emmart, C. Weems","doi":"10.1109/HiPC.2011.6152712","DOIUrl":null,"url":null,"abstract":"We report an algorithm for division of a multi-precision integer by a single-precision value using a graphics processing unit (GPU). Our algorithm combines a parallel version of Jebelean's exact division algorithm with a left-to-right algorithm for computing the borrow chain, to relax the requirement of exactness. We also employ Takahashi's recently reported cyclic reduction technique [10] for GPU division to further enhance performance. The result is that our algorithm is asymptotically faster, at O(n/p + log p), than Takahashi's algorithm at O(n/p log p). We report results for dividends with precisions of 1024, 2048, and 4096 bits running on an NVIDIA GTX 480, and show that, for non-constant divisors, our algorithm is 20% slower at 1024 bits (due to startup overhead), by 2048 we are 40% faster, and at 4096 bits we are able to run 2.5 times faster. For division by constants, with precomputed tables, our algorithm is faster at all sizes with a speedup ranging from 2.3 to 6 times faster.","PeriodicalId":122468,"journal":{"name":"2011 18th International Conference on High Performance Computing","volume":"9 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-12-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 18th International Conference on High Performance Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HiPC.2011.6152712","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
We report an algorithm for division of a multi-precision integer by a single-precision value using a graphics processing unit (GPU). Our algorithm combines a parallel version of Jebelean's exact division algorithm with a left-to-right algorithm for computing the borrow chain, to relax the requirement of exactness. We also employ Takahashi's recently reported cyclic reduction technique [10] for GPU division to further enhance performance. The result is that our algorithm is asymptotically faster, at O(n/p + log p), than Takahashi's algorithm at O(n/p log p). We report results for dividends with precisions of 1024, 2048, and 4096 bits running on an NVIDIA GTX 480, and show that, for non-constant divisors, our algorithm is 20% slower at 1024 bits (due to startup overhead), by 2048 we are 40% faster, and at 4096 bits we are able to run 2.5 times faster. For division by constants, with precomputed tables, our algorithm is faster at all sizes with a speedup ranging from 2.3 to 6 times faster.