fpga上的高效运算:一种多项式向量处理器

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC) Pub Date : 2020-01-01 DOI:10.1109/ASP-DAC47756.2020.9045459

Zhaohui Chen, Yuan Ma, Tianyu Chen, Jingqiang Lin, Jiwu Jing

{"title":"fpga上的高效运算:一种多项式向量处理器","authors":"Zhaohui Chen, Yuan Ma, Tianyu Chen, Jingqiang Lin, Jiwu Jing","doi":"10.1109/ASP-DAC47756.2020.9045459","DOIUrl":null,"url":null,"abstract":"Kyber is a promising candidate in post-quantum cryptography standardization process. In this paper, we propose a targeted optimization strategy and implement a processor for Kyber on FPGAs. By merging the operations, we cut off 29.4% clock cycles for Kyber512 and 33.3% for Kyber1024 compared with the textbook implementations. We utilize Gentlemen-Sande (GS) butterfly to optimize the Number-Theoretic Transform (NTT) implementation. The bottleneck of memory access is broken taking advantage of a dual-column sequential scheme. We further propose a pipeline architecture for better performance. The optimizations help the processor achieve 31684 NTT operations per second using only 477 LUTs, 237 FFs and 1 DSP. Our strategy is at least 3x more efficient than the state-of-the-art module for NTT with a similar security level.","PeriodicalId":125112,"journal":{"name":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":"{\"title\":\"Towards Efficient Kyber on FPGAs: A Processor for Vector of Polynomials\",\"authors\":\"Zhaohui Chen, Yuan Ma, Tianyu Chen, Jingqiang Lin, Jiwu Jing\",\"doi\":\"10.1109/ASP-DAC47756.2020.9045459\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Kyber is a promising candidate in post-quantum cryptography standardization process. In this paper, we propose a targeted optimization strategy and implement a processor for Kyber on FPGAs. By merging the operations, we cut off 29.4% clock cycles for Kyber512 and 33.3% for Kyber1024 compared with the textbook implementations. We utilize Gentlemen-Sande (GS) butterfly to optimize the Number-Theoretic Transform (NTT) implementation. The bottleneck of memory access is broken taking advantage of a dual-column sequential scheme. We further propose a pipeline architecture for better performance. The optimizations help the processor achieve 31684 NTT operations per second using only 477 LUTs, 237 FFs and 1 DSP. Our strategy is at least 3x more efficient than the state-of-the-art module for NTT with a similar security level.\",\"PeriodicalId\":125112,\"journal\":{\"name\":\"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"19\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ASP-DAC47756.2020.9045459\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASP-DAC47756.2020.9045459","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 19

摘要

Kyber是后量子密码学标准化过程中一个很有前途的候选者。本文提出了一种有针对性的优化策略，并在fpga上实现了Kyber处理器。通过合并操作，与教科书实现相比，Kyber512和Kyber1024的时钟周期分别减少了29.4%和33.3%。我们利用绅士-桑德(GS)蝴蝶来优化数论变换(NTT)的实现。利用双列顺序方案打破了内存访问的瓶颈。我们进一步提出了一个管道架构，以获得更好的性能。优化帮助处理器实现每秒31684次NTT操作，仅使用477个lut, 237个ff和1个DSP。我们的策略至少比NTT最先进的模块高效3倍，具有相似的安全级别。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Towards Efficient Kyber on FPGAs: A Processor for Vector of Polynomials

Kyber is a promising candidate in post-quantum cryptography standardization process. In this paper, we propose a targeted optimization strategy and implement a processor for Kyber on FPGAs. By merging the operations, we cut off 29.4% clock cycles for Kyber512 and 33.3% for Kyber1024 compared with the textbook implementations. We utilize Gentlemen-Sande (GS) butterfly to optimize the Number-Theoretic Transform (NTT) implementation. The bottleneck of memory access is broken taking advantage of a dual-column sequential scheme. We further propose a pipeline architecture for better performance. The optimizations help the processor achieve 31684 NTT operations per second using only 477 LUTs, 237 FFs and 1 DSP. Our strategy is at least 3x more efficient than the state-of-the-art module for NTT with a similar security level.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)

自引率

0.00%

发文量