Jeffrey A. Robinson, B. Dixon, Jeffrey M. Galloway
{"title":"使用CUDA进行蒙哥马利乘法","authors":"Jeffrey A. Robinson, B. Dixon, Jeffrey M. Galloway","doi":"10.1145/2638404.2638485","DOIUrl":null,"url":null,"abstract":"Modular multiplication is useful in many areas of number theory; the most well-known being cryptography. In order to encrypt or decrypt a document using either RSA or ECC encryption algorithms which perform long chains of multiplication modulo N. To factor large numbers many factoring algorithms have long multiplication chains modulo N, where N is a prime number. In our paper, we implement a highly optimized systolic Montgomery multiplication algorithm in order to provide high performance modular multiplications. We develop our algorithm using NVIDIAs general-purpose parallel programming model called CUDA (Compute Unified Device Architecture) for NVIDIA GPUs (Graphics Processing Units). Our implementation can perform up to 338.15 million multiplications per second using a GTX 660 and 475.66 using a GTX 670 with 256 bit numbers. While using 1024-bit numbers the GTX 660 can perform 20.15 million multiplications per second and the GTX 670 can perform 27.89 million multiplications per second. When using 2048-bit numbers, the GTX 660 can perform 4.96 million multiplications per second and the GTX 670 can perform 6.78 million multiplications per second. We also show that our version is faster than previous implemented multiprecision Montgomery multiplication algorithms, while also providing an intuitive data representation.","PeriodicalId":91384,"journal":{"name":"Proceedings of the 2014 ACM Southeast Regional Conference","volume":"280 3 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2014-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Montgomery multiplication using CUDA\",\"authors\":\"Jeffrey A. Robinson, B. Dixon, Jeffrey M. Galloway\",\"doi\":\"10.1145/2638404.2638485\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modular multiplication is useful in many areas of number theory; the most well-known being cryptography. In order to encrypt or decrypt a document using either RSA or ECC encryption algorithms which perform long chains of multiplication modulo N. To factor large numbers many factoring algorithms have long multiplication chains modulo N, where N is a prime number. In our paper, we implement a highly optimized systolic Montgomery multiplication algorithm in order to provide high performance modular multiplications. We develop our algorithm using NVIDIAs general-purpose parallel programming model called CUDA (Compute Unified Device Architecture) for NVIDIA GPUs (Graphics Processing Units). Our implementation can perform up to 338.15 million multiplications per second using a GTX 660 and 475.66 using a GTX 670 with 256 bit numbers. While using 1024-bit numbers the GTX 660 can perform 20.15 million multiplications per second and the GTX 670 can perform 27.89 million multiplications per second. When using 2048-bit numbers, the GTX 660 can perform 4.96 million multiplications per second and the GTX 670 can perform 6.78 million multiplications per second. We also show that our version is faster than previous implemented multiprecision Montgomery multiplication algorithms, while also providing an intuitive data representation.\",\"PeriodicalId\":91384,\"journal\":{\"name\":\"Proceedings of the 2014 ACM Southeast Regional Conference\",\"volume\":\"280 3 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-03-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2014 ACM Southeast Regional Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2638404.2638485\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2014 ACM Southeast Regional Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2638404.2638485","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Modular multiplication is useful in many areas of number theory; the most well-known being cryptography. In order to encrypt or decrypt a document using either RSA or ECC encryption algorithms which perform long chains of multiplication modulo N. To factor large numbers many factoring algorithms have long multiplication chains modulo N, where N is a prime number. In our paper, we implement a highly optimized systolic Montgomery multiplication algorithm in order to provide high performance modular multiplications. We develop our algorithm using NVIDIAs general-purpose parallel programming model called CUDA (Compute Unified Device Architecture) for NVIDIA GPUs (Graphics Processing Units). Our implementation can perform up to 338.15 million multiplications per second using a GTX 660 and 475.66 using a GTX 670 with 256 bit numbers. While using 1024-bit numbers the GTX 660 can perform 20.15 million multiplications per second and the GTX 670 can perform 27.89 million multiplications per second. When using 2048-bit numbers, the GTX 660 can perform 4.96 million multiplications per second and the GTX 670 can perform 6.78 million multiplications per second. We also show that our version is faster than previous implemented multiprecision Montgomery multiplication algorithms, while also providing an intuitive data representation.