fpga的折叠整数乘法

The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Pub Date : 2021-02-17 DOI:10.1145/3431920.3439299

M. Langhammer, B. Pasca

{"title":"fpga的折叠整数乘法","authors":"M. Langhammer, B. Pasca","doi":"10.1145/3431920.3439299","DOIUrl":null,"url":null,"abstract":"Encryption - especially the key exchange algorithms such as RSA - is an increasing use-model for FPGAs, driven by the adoption of the FPGA as a SmartNIC in the datacenter. While bulk encryption such as AES maps well to generic FPGA features, the very large multipliers required for RSA are a much more difficult problem. Although FPGAs contain thousands of small integer multipliers in DSP Blocks, aggregating them into very large multipliers is very challenging because of the large amount of soft logic required - especially in the form of long adders, and the high embedded multiplier count. In this paper, we describe a large multiplier architecture that operates in a multi-cycle format and which has a linear area/throughput ratio. We show results for a 2048-bit multiplier that has a latency of 118 cycles, inputs data every 9th cycle and closes timing at 377MHz in an Intel Arria 10 FPGA, and over 400MHz in a Stratix 10. The proposed multiplier uses 1/9 of the DSP resources typically used in a 2048-bit Karatsuba implementation, showing a perfectly linear throughput to DSP-count ratio. Our proposed solution outperforms recently reported results, in either arithmetic complexity - by making use of the Karatsuba techniques, or in scheduling efficiency - embedded DSP resources are fully utilized.","PeriodicalId":386071,"journal":{"name":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","volume":"9 9","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Folded Integer Multiplication for FPGAs\",\"authors\":\"M. Langhammer, B. Pasca\",\"doi\":\"10.1145/3431920.3439299\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Encryption - especially the key exchange algorithms such as RSA - is an increasing use-model for FPGAs, driven by the adoption of the FPGA as a SmartNIC in the datacenter. While bulk encryption such as AES maps well to generic FPGA features, the very large multipliers required for RSA are a much more difficult problem. Although FPGAs contain thousands of small integer multipliers in DSP Blocks, aggregating them into very large multipliers is very challenging because of the large amount of soft logic required - especially in the form of long adders, and the high embedded multiplier count. In this paper, we describe a large multiplier architecture that operates in a multi-cycle format and which has a linear area/throughput ratio. We show results for a 2048-bit multiplier that has a latency of 118 cycles, inputs data every 9th cycle and closes timing at 377MHz in an Intel Arria 10 FPGA, and over 400MHz in a Stratix 10. The proposed multiplier uses 1/9 of the DSP resources typically used in a 2048-bit Karatsuba implementation, showing a perfectly linear throughput to DSP-count ratio. Our proposed solution outperforms recently reported results, in either arithmetic complexity - by making use of the Karatsuba techniques, or in scheduling efficiency - embedded DSP resources are fully utilized.\",\"PeriodicalId\":386071,\"journal\":{\"name\":\"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"volume\":\"9 9\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-02-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3431920.3439299\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3431920.3439299","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

加密——尤其是像RSA这样的密钥交换算法——是FPGA越来越多的使用模型，这是由于在数据中心采用FPGA作为SmartNIC的驱动。虽然像AES这样的批量加密很好地映射到通用的FPGA特性，但RSA所需的非常大的乘法器是一个更困难的问题。尽管fpga在DSP块中包含数千个小整数乘法器，但由于需要大量的软逻辑，特别是长加法器和高嵌入式乘法器计数，因此将它们聚合成非常大的乘法器非常具有挑战性。在本文中，我们描述了一种大型乘法器架构，该架构以多周期格式运行，具有线性面积/吞吐量比。我们展示了2048位乘法器的结果，该乘法器具有118个周期的延迟，每9个周期输入数据，并在Intel Arria 10 FPGA中以377MHz关闭时序，在Stratix 10中超过400MHz。所提出的乘法器使用2048位Karatsuba实现中通常使用的DSP资源的1/9，显示出完美的线性吞吐量与DSP计数比。我们提出的解决方案优于最近报道的结果，无论是在算术复杂度上-通过使用Karatsuba技术，还是在调度效率上-嵌入式DSP资源得到充分利用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Folded Integer Multiplication for FPGAs

Encryption - especially the key exchange algorithms such as RSA - is an increasing use-model for FPGAs, driven by the adoption of the FPGA as a SmartNIC in the datacenter. While bulk encryption such as AES maps well to generic FPGA features, the very large multipliers required for RSA are a much more difficult problem. Although FPGAs contain thousands of small integer multipliers in DSP Blocks, aggregating them into very large multipliers is very challenging because of the large amount of soft logic required - especially in the form of long adders, and the high embedded multiplier count. In this paper, we describe a large multiplier architecture that operates in a multi-cycle format and which has a linear area/throughput ratio. We show results for a 2048-bit multiplier that has a latency of 118 cycles, inputs data every 9th cycle and closes timing at 377MHz in an Intel Arria 10 FPGA, and over 400MHz in a Stratix 10. The proposed multiplier uses 1/9 of the DSP resources typically used in a 2048-bit Karatsuba implementation, showing a perfectly linear throughput to DSP-count ratio. Our proposed solution outperforms recently reported results, in either arithmetic complexity - by making use of the Karatsuba techniques, or in scheduling efficiency - embedded DSP resources are fully utilized.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

自引率

0.00%

发文量