Kyber在移动设备上的高效实现

2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS) Pub Date : 2021-12-01 DOI:10.1109/ICPADS53394.2021.00069

Lirui Zhao, Jipeng Zhang, Junhao Huang, Zhe Liu, G. Hancke

{"title":"Kyber在移动设备上的高效实现","authors":"Lirui Zhao, Jipeng Zhang, Junhao Huang, Zhe Liu, G. Hancke","doi":"10.1109/ICPADS53394.2021.00069","DOIUrl":null,"url":null,"abstract":"Kyber, an IND-CCA-secure key encapsulation mechanism (KEM) based on the MLWE problem, has been shortlisted for the third round evaluation of the NIST Post-Quantum Cryptography Standardization. In this paper, we explored the optimizations of Kyber in high-performance processors from the ARM Cortex-A series, which are widely used in mainstream mobile phones. To improve the performance of Kyber, we utilized the powerful SIMD instruction set NEON in an ARMv8-A to parallelize the core modules of Kyber, i.e., modular reduction and NTT. Specifically, we specially designed the optimized implementation based on the characteristic of the NEON instruction set for the Barrett and Montgomery reduction algorithms. To make full use of the computing power of NEON instructions, we proposed a novel strategy for computing the 16-bit Barrett reduction without handling the 32-bit intermediate result. Our Barrett and Montgomery reduction showed 8.52 and 8.89 times faster than the reference implementation. As for NTT/INTT, we adopted the 2+5 layer merging strategy on an ARMv8-A to implement NTT/INTT after carefully analyzing the register occupancy of various layer merging techniques. Thanks to the selected layer merging strategy, our NTT and INTT achieved 11.89 and 13.45 times speedups compared with the reference implementation. Our optimized software achieved 1.77×, 1.85×, and 2.16× speedups for key generation, encapsulation, and decapsulation compared with Kyber's reference implementation.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Efficient Implementation of Kyber on Mobile Devices\",\"authors\":\"Lirui Zhao, Jipeng Zhang, Junhao Huang, Zhe Liu, G. Hancke\",\"doi\":\"10.1109/ICPADS53394.2021.00069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Kyber, an IND-CCA-secure key encapsulation mechanism (KEM) based on the MLWE problem, has been shortlisted for the third round evaluation of the NIST Post-Quantum Cryptography Standardization. In this paper, we explored the optimizations of Kyber in high-performance processors from the ARM Cortex-A series, which are widely used in mainstream mobile phones. To improve the performance of Kyber, we utilized the powerful SIMD instruction set NEON in an ARMv8-A to parallelize the core modules of Kyber, i.e., modular reduction and NTT. Specifically, we specially designed the optimized implementation based on the characteristic of the NEON instruction set for the Barrett and Montgomery reduction algorithms. To make full use of the computing power of NEON instructions, we proposed a novel strategy for computing the 16-bit Barrett reduction without handling the 32-bit intermediate result. Our Barrett and Montgomery reduction showed 8.52 and 8.89 times faster than the reference implementation. As for NTT/INTT, we adopted the 2+5 layer merging strategy on an ARMv8-A to implement NTT/INTT after carefully analyzing the register occupancy of various layer merging techniques. Thanks to the selected layer merging strategy, our NTT and INTT achieved 11.89 and 13.45 times speedups compared with the reference implementation. Our optimized software achieved 1.77×, 1.85×, and 2.16× speedups for key generation, encapsulation, and decapsulation compared with Kyber's reference implementation.\",\"PeriodicalId\":309508,\"journal\":{\"name\":\"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPADS53394.2021.00069\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS53394.2021.00069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

Kyber是一种基于MLWE问题的ind - cca安全密钥封装机制(KEM)，已入围NIST后量子加密标准化第三轮评估。在本文中，我们探索了Kyber在主流手机中广泛使用的ARM Cortex-A系列高性能处理器中的优化。为了提高Kyber的性能，我们利用ARMv8-A中强大的SIMD指令集NEON来并行化Kyber的核心模块，即模块化约简和NTT。具体来说，我们根据NEON指令集的特点为Barrett和Montgomery约简算法专门设计了优化实现。为了充分利用NEON指令的计算能力，我们提出了一种计算16位Barrett约简而不处理32位中间结果的新策略。我们的Barrett和Montgomery还原比参考实现分别快8.52和8.89倍。对于NTT/INTT，在仔细分析了各种层合并技术的寄存器占用情况后，我们在ARMv8-A上采用了2+5层合并策略来实现NTT/INTT。由于选择了层合并策略，我们的NTT和INTT的速度比参考实现分别提高了11.89倍和13.45倍。与Kyber的参考实现相比，我们优化的软件在密钥生成、封装和解封装方面的速度分别提高了1.77倍、1.85倍和2.16倍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Efficient Implementation of Kyber on Mobile Devices

Kyber, an IND-CCA-secure key encapsulation mechanism (KEM) based on the MLWE problem, has been shortlisted for the third round evaluation of the NIST Post-Quantum Cryptography Standardization. In this paper, we explored the optimizations of Kyber in high-performance processors from the ARM Cortex-A series, which are widely used in mainstream mobile phones. To improve the performance of Kyber, we utilized the powerful SIMD instruction set NEON in an ARMv8-A to parallelize the core modules of Kyber, i.e., modular reduction and NTT. Specifically, we specially designed the optimized implementation based on the characteristic of the NEON instruction set for the Barrett and Montgomery reduction algorithms. To make full use of the computing power of NEON instructions, we proposed a novel strategy for computing the 16-bit Barrett reduction without handling the 32-bit intermediate result. Our Barrett and Montgomery reduction showed 8.52 and 8.89 times faster than the reference implementation. As for NTT/INTT, we adopted the 2+5 layer merging strategy on an ARMv8-A to implement NTT/INTT after carefully analyzing the register occupancy of various layer merging techniques. Thanks to the selected layer merging strategy, our NTT and INTT achieved 11.89 and 13.45 times speedups compared with the reference implementation. Our optimized software achieved 1.77×, 1.85×, and 2.16× speedups for key generation, encapsulation, and decapsulation compared with Kyber's reference implementation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)

自引率

0.00%

发文量