GMC-crypto：针对 GF(p) 上通用蒙哥马利曲线的 ECC 点乘法的低延迟实现

IF 3.4 3区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Journal of Parallel and Distributed Computing Pub Date : 2024-07-03 DOI:10.1016/j.jpdc.2024.104946

Khalid Javeed , Yasir Ali Shah , David Gregg

{"title":"GMC-crypto：针对 GF(p) 上通用蒙哥马利曲线的 ECC 点乘法的低延迟实现","authors":"Khalid Javeed , Yasir Ali Shah , David Gregg","doi":"10.1016/j.jpdc.2024.104946","DOIUrl":null,"url":null,"abstract":"<div><p>Elliptic Curve Cryptography (ECC) is the front-runner among available public key cryptography (PKC) schemes due to its potential to offer higher security per key bit. All ECC-based cryptosystems heavily rely on point multiplication operation where its efficient realization has attained notable focus in the research community. Low latency implementation of the point multiplication operation is frequently required in high-speed applications such as online authentication and web server certification. This paper presents a low latency ECC point multiplication architecture for Montgomery curves over generic prime filed <span><math><mi>G</mi><mi>F</mi><mo>(</mo><mi>p</mi><mo>)</mo></math></span>. The proposed architecture is able to operate for a general prime modulus without any constraints on its structure. It is based on a new novel pipelined modular multiplier developed using the Montgomery multiplication and the Karatsuba-Offman technique with a four-part splitting methodology. The Montgomery ladder approach is adopted on a system level, where a high-speed scheduling strategy to efficiently execute <span><math><mi>G</mi><mi>F</mi><mo>(</mo><mi>p</mi><mo>)</mo></math></span> operations is also presented. Due to these circuit and system-level optimizations, the proposed design delivers low-latency results without a significant increase in resource consumption. The proposed architecture is described in Verilog-HDL for 256-bit key lengths and implemented on Virtex-7 and Virtex-6 FPGA platforms using Xilinx ISE Design Suite. On the Virtex-7 FPGA platform, it performs a 256-bit point multiplication operation in just 110.9 <em>u</em>s with a throughput of almost 9017 operations per second. The implementation results demonstrate that despite its generic nature, it produces low latency as compared to the state-of-the-art. Therefore, it has prominent prospects to be used in high-speed authentication and certification applications.</p></div>","PeriodicalId":54775,"journal":{"name":"Journal of Parallel and Distributed Computing","volume":"193 ","pages":"Article 104946"},"PeriodicalIF":3.4000,"publicationDate":"2024-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GMC-crypto: Low latency implementation of ECC point multiplication for generic Montgomery curves over GF(p)\",\"authors\":\"Khalid Javeed , Yasir Ali Shah , David Gregg\",\"doi\":\"10.1016/j.jpdc.2024.104946\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><p>Elliptic Curve Cryptography (ECC) is the front-runner among available public key cryptography (PKC) schemes due to its potential to offer higher security per key bit. All ECC-based cryptosystems heavily rely on point multiplication operation where its efficient realization has attained notable focus in the research community. Low latency implementation of the point multiplication operation is frequently required in high-speed applications such as online authentication and web server certification. This paper presents a low latency ECC point multiplication architecture for Montgomery curves over generic prime filed <span><math><mi>G</mi><mi>F</mi><mo>(</mo><mi>p</mi><mo>)</mo></math></span>. The proposed architecture is able to operate for a general prime modulus without any constraints on its structure. It is based on a new novel pipelined modular multiplier developed using the Montgomery multiplication and the Karatsuba-Offman technique with a four-part splitting methodology. The Montgomery ladder approach is adopted on a system level, where a high-speed scheduling strategy to efficiently execute <span><math><mi>G</mi><mi>F</mi><mo>(</mo><mi>p</mi><mo>)</mo></math></span> operations is also presented. Due to these circuit and system-level optimizations, the proposed design delivers low-latency results without a significant increase in resource consumption. The proposed architecture is described in Verilog-HDL for 256-bit key lengths and implemented on Virtex-7 and Virtex-6 FPGA platforms using Xilinx ISE Design Suite. On the Virtex-7 FPGA platform, it performs a 256-bit point multiplication operation in just 110.9 <em>u</em>s with a throughput of almost 9017 operations per second. The implementation results demonstrate that despite its generic nature, it produces low latency as compared to the state-of-the-art. Therefore, it has prominent prospects to be used in high-speed authentication and certification applications.</p></div>\",\"PeriodicalId\":54775,\"journal\":{\"name\":\"Journal of Parallel and Distributed Computing\",\"volume\":\"193 \",\"pages\":\"Article 104946\"},\"PeriodicalIF\":3.4000,\"publicationDate\":\"2024-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Parallel and Distributed Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0743731524001102\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Parallel and Distributed Computing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0743731524001102","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

椭圆曲线加密算法（ECC）是现有公开密钥加密算法（PKC）方案中的佼佼者，因为它有可能为每个密钥比特提供更高的安全性。所有基于 ECC 的密码系统都严重依赖于点乘法运算，而高效实现点乘法运算已成为研究界关注的焦点。在在线身份验证和网络服务器认证等高速应用中，经常需要低延迟地实现点乘操作。本文针对通用素数 GF(p) 上的蒙哥马利曲线提出了一种低延迟 ECC 点乘法架构。所提出的架构能够在不受其结构限制的情况下对一般素数进行运算。它基于一种新颖的流水线模块化乘法器，采用蒙哥马利乘法和卡拉祖巴-奥夫曼技术以及四部分分割方法。系统级采用了蒙哥马利梯形图方法，并提出了高效执行 GF(p) 运算的高速调度策略。由于进行了这些电路和系统级优化，所提出的设计可在不显著增加资源消耗的情况下实现低延迟结果。该架构采用 Verilog-HDL 对 256 位密钥长度进行了描述，并使用 Xilinx ISE Design Suite 在 Virtex-7 和 Virtex-6 FPGA 平台上实现。在 Virtex-7 FPGA 平台上，执行 256 位点乘法运算仅需 110.9 秒，吞吐量接近每秒 9017 次运算。实现结果表明，尽管它具有通用性，但与最先进的技术相比，它产生的延迟很低。因此，它在高速身份验证和认证应用中有着广阔的应用前景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GMC-crypto: Low latency implementation of ECC point multiplication for generic Montgomery curves over GF(p)

Elliptic Curve Cryptography (ECC) is the front-runner among available public key cryptography (PKC) schemes due to its potential to offer higher security per key bit. All ECC-based cryptosystems heavily rely on point multiplication operation where its efficient realization has attained notable focus in the research community. Low latency implementation of the point multiplication operation is frequently required in high-speed applications such as online authentication and web server certification. This paper presents a low latency ECC point multiplication architecture for Montgomery curves over generic prime filed $G F (p)$ . The proposed architecture is able to operate for a general prime modulus without any constraints on its structure. It is based on a new novel pipelined modular multiplier developed using the Montgomery multiplication and the Karatsuba-Offman technique with a four-part splitting methodology. The Montgomery ladder approach is adopted on a system level, where a high-speed scheduling strategy to efficiently execute $G F (p)$ operations is also presented. Due to these circuit and system-level optimizations, the proposed design delivers low-latency results without a significant increase in resource consumption. The proposed architecture is described in Verilog-HDL for 256-bit key lengths and implemented on Virtex-7 and Virtex-6 FPGA platforms using Xilinx ISE Design Suite. On the Virtex-7 FPGA platform, it performs a 256-bit point multiplication operation in just 110.9 us with a throughput of almost 9017 operations per second. The implementation results demonstrate that despite its generic nature, it produces low latency as compared to the state-of-the-art. Therefore, it has prominent prospects to be used in high-speed authentication and certification applications.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Parallel and Distributed Computing 工程技术-计算机：理论方法

CiteScore

10.30

自引率

2.60%

发文量

172

审稿时长

12 months

期刊介绍： This international journal is directed to researchers, engineers, educators, managers, programmers, and users of computers who have particular interests in parallel processing and/or distributed computing. The Journal of Parallel and Distributed Computing publishes original research papers and timely review articles on the theory, design, evaluation, and use of parallel and/or distributed computing systems. The journal also features special issues on these topics; again covering the full range from the design to the use of our targeted systems.