Montgomery multiplication using CUDA

Jeffrey A. Robinson, B. Dixon, Jeffrey M. Galloway
{"title":"Montgomery multiplication using CUDA","authors":"Jeffrey A. Robinson, B. Dixon, Jeffrey M. Galloway","doi":"10.1145/2638404.2638485","DOIUrl":null,"url":null,"abstract":"Modular multiplication is useful in many areas of number theory; the most well-known being cryptography. In order to encrypt or decrypt a document using either RSA or ECC encryption algorithms which perform long chains of multiplication modulo N. To factor large numbers many factoring algorithms have long multiplication chains modulo N, where N is a prime number. In our paper, we implement a highly optimized systolic Montgomery multiplication algorithm in order to provide high performance modular multiplications. We develop our algorithm using NVIDIAs general-purpose parallel programming model called CUDA (Compute Unified Device Architecture) for NVIDIA GPUs (Graphics Processing Units). Our implementation can perform up to 338.15 million multiplications per second using a GTX 660 and 475.66 using a GTX 670 with 256 bit numbers. While using 1024-bit numbers the GTX 660 can perform 20.15 million multiplications per second and the GTX 670 can perform 27.89 million multiplications per second. When using 2048-bit numbers, the GTX 660 can perform 4.96 million multiplications per second and the GTX 670 can perform 6.78 million multiplications per second. We also show that our version is faster than previous implemented multiprecision Montgomery multiplication algorithms, while also providing an intuitive data representation.","PeriodicalId":91384,"journal":{"name":"Proceedings of the 2014 ACM Southeast Regional Conference","volume":"280 3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2014-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2014 ACM Southeast Regional Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2638404.2638485","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Modular multiplication is useful in many areas of number theory; the most well-known being cryptography. In order to encrypt or decrypt a document using either RSA or ECC encryption algorithms which perform long chains of multiplication modulo N. To factor large numbers many factoring algorithms have long multiplication chains modulo N, where N is a prime number. In our paper, we implement a highly optimized systolic Montgomery multiplication algorithm in order to provide high performance modular multiplications. We develop our algorithm using NVIDIAs general-purpose parallel programming model called CUDA (Compute Unified Device Architecture) for NVIDIA GPUs (Graphics Processing Units). Our implementation can perform up to 338.15 million multiplications per second using a GTX 660 and 475.66 using a GTX 670 with 256 bit numbers. While using 1024-bit numbers the GTX 660 can perform 20.15 million multiplications per second and the GTX 670 can perform 27.89 million multiplications per second. When using 2048-bit numbers, the GTX 660 can perform 4.96 million multiplications per second and the GTX 670 can perform 6.78 million multiplications per second. We also show that our version is faster than previous implemented multiprecision Montgomery multiplication algorithms, while also providing an intuitive data representation.
使用CUDA进行蒙哥马利乘法
模乘法在数论的许多领域都很有用;最著名的是密码学。为了使用RSA或ECC加密算法加密或解密文档,这些算法执行以N为模的长乘法链。为了分解大数,许多因式算法具有以N为模的长乘法链,其中N是素数。在我们的论文中,我们实现了一个高度优化的收缩蒙哥马利乘法算法,以提供高性能的模块化乘法。我们使用NVIDIA gpu(图形处理单元)的通用并行编程模型CUDA(计算统一设备架构)来开发算法。我们的实现可以使用GTX 660每秒执行高达3.3815亿次乘法,使用256位数字的GTX 670每秒执行475.66次乘法。当使用1024位数字时,GTX 660每秒可以执行2015万次乘法,GTX 670每秒可以执行2789万次乘法。当使用2048位数字时,GTX 660每秒可执行496万次乘法,GTX 670每秒可执行678万次乘法。我们还展示了我们的版本比以前实现的多精度Montgomery乘法算法更快,同时还提供了直观的数据表示。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信