Jiaxuan Wang , Xiaofeng Wang , Wenzheng Liu , Qianqian Xing , Xiaoyong Tang , Tan Deng , Ronghui Cao , Mingfeng Huang
{"title":"一种用于物联网设备的并行和流水线式高速蒙哥马利模块乘法器","authors":"Jiaxuan Wang , Xiaofeng Wang , Wenzheng Liu , Qianqian Xing , Xiaoyong Tang , Tan Deng , Ronghui Cao , Mingfeng Huang","doi":"10.1016/j.comnet.2025.111282","DOIUrl":null,"url":null,"abstract":"<div><div>Lightweight authentication of Internet of Things devices requires the efficient implementation of cryptographic Montgomery modular multiplier. However, current Montgomery modular multiplier exhibits limited optimization for data dependencies, resulting in increased idle cycles in the key operation units. In this paper, we propose a parallel and pipelined Montgomery Multiplier (PPMM). The design reduces the overall clock cycle time by leveraging the core unit and planning algorithm scheduling structure with less data dependency. Moreover, we design the multiplier-adder unit with appropriate number of bits and pipeline to improve the overall frequency and give the relevant formulas between clock cycles and radix. Finally, we give an optimized implementation on Xilinx Virtex-7 FPGA over general GF(p) for field sizes 256, 384, and 512 bits. Experiments show that the PPMM takes only <span><math><mrow><mn>0</mn><mo>.</mo><mn>123</mn><mspace></mspace><mi>μ</mi><mi>s</mi></mrow></math></span>, <span><math><mrow><mn>0</mn><mo>.</mo><mn>150</mn><mspace></mspace><mi>μ</mi><mi>s</mi></mrow></math></span> and <span><math><mrow><mn>0</mn><mo>.</mo><mn>190</mn><mspace></mspace><mi>μ</mi><mi>s</mi></mrow></math></span> to perform the high-radix modular multiplication of 256, 384 and 512 bits. Compared with other FPGA implementations, our design demonstrates superior performance in terms of speed and throughput. It achieves a higher throughput rate at greater bit counts, making it well-suited for security-intensive applications. Specifically, the 384-bits PPMM with appropriate ATP performance is nearly 2.6 times faster than the reference design.</div></div>","PeriodicalId":50637,"journal":{"name":"Computer Networks","volume":"265 ","pages":"Article 111282"},"PeriodicalIF":4.4000,"publicationDate":"2025-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A parallel and pipelined high speed Montgomery modular multiplier for IoT devices\",\"authors\":\"Jiaxuan Wang , Xiaofeng Wang , Wenzheng Liu , Qianqian Xing , Xiaoyong Tang , Tan Deng , Ronghui Cao , Mingfeng Huang\",\"doi\":\"10.1016/j.comnet.2025.111282\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Lightweight authentication of Internet of Things devices requires the efficient implementation of cryptographic Montgomery modular multiplier. However, current Montgomery modular multiplier exhibits limited optimization for data dependencies, resulting in increased idle cycles in the key operation units. In this paper, we propose a parallel and pipelined Montgomery Multiplier (PPMM). The design reduces the overall clock cycle time by leveraging the core unit and planning algorithm scheduling structure with less data dependency. Moreover, we design the multiplier-adder unit with appropriate number of bits and pipeline to improve the overall frequency and give the relevant formulas between clock cycles and radix. Finally, we give an optimized implementation on Xilinx Virtex-7 FPGA over general GF(p) for field sizes 256, 384, and 512 bits. Experiments show that the PPMM takes only <span><math><mrow><mn>0</mn><mo>.</mo><mn>123</mn><mspace></mspace><mi>μ</mi><mi>s</mi></mrow></math></span>, <span><math><mrow><mn>0</mn><mo>.</mo><mn>150</mn><mspace></mspace><mi>μ</mi><mi>s</mi></mrow></math></span> and <span><math><mrow><mn>0</mn><mo>.</mo><mn>190</mn><mspace></mspace><mi>μ</mi><mi>s</mi></mrow></math></span> to perform the high-radix modular multiplication of 256, 384 and 512 bits. Compared with other FPGA implementations, our design demonstrates superior performance in terms of speed and throughput. It achieves a higher throughput rate at greater bit counts, making it well-suited for security-intensive applications. Specifically, the 384-bits PPMM with appropriate ATP performance is nearly 2.6 times faster than the reference design.</div></div>\",\"PeriodicalId\":50637,\"journal\":{\"name\":\"Computer Networks\",\"volume\":\"265 \",\"pages\":\"Article 111282\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2025-04-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computer Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S1389128625002506\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computer Networks","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1389128625002506","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
A parallel and pipelined high speed Montgomery modular multiplier for IoT devices
Lightweight authentication of Internet of Things devices requires the efficient implementation of cryptographic Montgomery modular multiplier. However, current Montgomery modular multiplier exhibits limited optimization for data dependencies, resulting in increased idle cycles in the key operation units. In this paper, we propose a parallel and pipelined Montgomery Multiplier (PPMM). The design reduces the overall clock cycle time by leveraging the core unit and planning algorithm scheduling structure with less data dependency. Moreover, we design the multiplier-adder unit with appropriate number of bits and pipeline to improve the overall frequency and give the relevant formulas between clock cycles and radix. Finally, we give an optimized implementation on Xilinx Virtex-7 FPGA over general GF(p) for field sizes 256, 384, and 512 bits. Experiments show that the PPMM takes only , and to perform the high-radix modular multiplication of 256, 384 and 512 bits. Compared with other FPGA implementations, our design demonstrates superior performance in terms of speed and throughput. It achieves a higher throughput rate at greater bit counts, making it well-suited for security-intensive applications. Specifically, the 384-bits PPMM with appropriate ATP performance is nearly 2.6 times faster than the reference design.
期刊介绍:
Computer Networks is an international, archival journal providing a publication vehicle for complete coverage of all topics of interest to those involved in the computer communications networking area. The audience includes researchers, managers and operators of networks as well as designers and implementors. The Editorial Board will consider any material for publication that is of interest to those groups.