{"title":"DPF-ECC: Accelerating Elliptic Curve Cryptography with Floating-Point Computing Power of GPUs","authors":"Lili Gao, Fangyu Zheng, Niall Emmart, Jiankuo Dong, Jingqiang Lin, C. Weems","doi":"10.1109/IPDPS47924.2020.00058","DOIUrl":null,"url":null,"abstract":"Driven by artificial intelligence (AI) and computer vision industries, Graphics Processing Units (GPUs) are now rapidly achieving extraordinary computing power. In particular, the floating-point computing power, which is heavily relied on by graphics rendering and AI computation workload, is developing much faster in GPUs. Meanwhile, in many fields such as ecommerce and online finance, the demand for cryptographic operations for secure communications and authentication is also expanding.In this contribution, targeting the important cryptographic primitives widely used in TLS 1.3, etc., we implement Curve25519 and Edwards25519 with GPUs’ floating-point computing power, where various performance optimization methods are customized for the target platform, including novel big-number representations combined with a new floating-point-based computing algorithm, efficient merged reduction strategies, and curve-level acceleration. This paper reports record-setting performance for the elliptic-curve method: on TITAN V, we respectively achieve 7.21 and 77.30 million operations per second of unknown and known point multiplication of Edwards25519, and 13.55 million operations per second of point multiplication of Curve25519. To the best of our knowledge, this contribution is the first to show that floating-point-based ECC implementations can outperform the integer-based ones by a huge margin. The experimental result in Tesla P100 achieves over double performance of the existing fastest integer work on the same platform, and the result in TITAN V sets a record for the throughput which is 4.43 times better than the second.","PeriodicalId":6805,"journal":{"name":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"26 1","pages":"494-504"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"16","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS47924.2020.00058","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 16
Abstract
Driven by artificial intelligence (AI) and computer vision industries, Graphics Processing Units (GPUs) are now rapidly achieving extraordinary computing power. In particular, the floating-point computing power, which is heavily relied on by graphics rendering and AI computation workload, is developing much faster in GPUs. Meanwhile, in many fields such as ecommerce and online finance, the demand for cryptographic operations for secure communications and authentication is also expanding.In this contribution, targeting the important cryptographic primitives widely used in TLS 1.3, etc., we implement Curve25519 and Edwards25519 with GPUs’ floating-point computing power, where various performance optimization methods are customized for the target platform, including novel big-number representations combined with a new floating-point-based computing algorithm, efficient merged reduction strategies, and curve-level acceleration. This paper reports record-setting performance for the elliptic-curve method: on TITAN V, we respectively achieve 7.21 and 77.30 million operations per second of unknown and known point multiplication of Edwards25519, and 13.55 million operations per second of point multiplication of Curve25519. To the best of our knowledge, this contribution is the first to show that floating-point-based ECC implementations can outperform the integer-based ones by a huge margin. The experimental result in Tesla P100 achieves over double performance of the existing fastest integer work on the same platform, and the result in TITAN V sets a record for the throughput which is 4.43 times better than the second.