Lirui Zhao, Jipeng Zhang, Junhao Huang, Zhe Liu, G. Hancke
{"title":"Kyber在移动设备上的高效实现","authors":"Lirui Zhao, Jipeng Zhang, Junhao Huang, Zhe Liu, G. Hancke","doi":"10.1109/ICPADS53394.2021.00069","DOIUrl":null,"url":null,"abstract":"Kyber, an IND-CCA-secure key encapsulation mechanism (KEM) based on the MLWE problem, has been shortlisted for the third round evaluation of the NIST Post-Quantum Cryptography Standardization. In this paper, we explored the optimizations of Kyber in high-performance processors from the ARM Cortex-A series, which are widely used in mainstream mobile phones. To improve the performance of Kyber, we utilized the powerful SIMD instruction set NEON in an ARMv8-A to parallelize the core modules of Kyber, i.e., modular reduction and NTT. Specifically, we specially designed the optimized implementation based on the characteristic of the NEON instruction set for the Barrett and Montgomery reduction algorithms. To make full use of the computing power of NEON instructions, we proposed a novel strategy for computing the 16-bit Barrett reduction without handling the 32-bit intermediate result. Our Barrett and Montgomery reduction showed 8.52 and 8.89 times faster than the reference implementation. As for NTT/INTT, we adopted the 2+5 layer merging strategy on an ARMv8-A to implement NTT/INTT after carefully analyzing the register occupancy of various layer merging techniques. Thanks to the selected layer merging strategy, our NTT and INTT achieved 11.89 and 13.45 times speedups compared with the reference implementation. Our optimized software achieved 1.77×, 1.85×, and 2.16× speedups for key generation, encapsulation, and decapsulation compared with Kyber's reference implementation.","PeriodicalId":309508,"journal":{"name":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","volume":"50 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Efficient Implementation of Kyber on Mobile Devices\",\"authors\":\"Lirui Zhao, Jipeng Zhang, Junhao Huang, Zhe Liu, G. Hancke\",\"doi\":\"10.1109/ICPADS53394.2021.00069\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Kyber, an IND-CCA-secure key encapsulation mechanism (KEM) based on the MLWE problem, has been shortlisted for the third round evaluation of the NIST Post-Quantum Cryptography Standardization. In this paper, we explored the optimizations of Kyber in high-performance processors from the ARM Cortex-A series, which are widely used in mainstream mobile phones. To improve the performance of Kyber, we utilized the powerful SIMD instruction set NEON in an ARMv8-A to parallelize the core modules of Kyber, i.e., modular reduction and NTT. Specifically, we specially designed the optimized implementation based on the characteristic of the NEON instruction set for the Barrett and Montgomery reduction algorithms. To make full use of the computing power of NEON instructions, we proposed a novel strategy for computing the 16-bit Barrett reduction without handling the 32-bit intermediate result. Our Barrett and Montgomery reduction showed 8.52 and 8.89 times faster than the reference implementation. As for NTT/INTT, we adopted the 2+5 layer merging strategy on an ARMv8-A to implement NTT/INTT after carefully analyzing the register occupancy of various layer merging techniques. Thanks to the selected layer merging strategy, our NTT and INTT achieved 11.89 and 13.45 times speedups compared with the reference implementation. Our optimized software achieved 1.77×, 1.85×, and 2.16× speedups for key generation, encapsulation, and decapsulation compared with Kyber's reference implementation.\",\"PeriodicalId\":309508,\"journal\":{\"name\":\"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)\",\"volume\":\"50 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPADS53394.2021.00069\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE 27th International Conference on Parallel and Distributed Systems (ICPADS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS53394.2021.00069","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Efficient Implementation of Kyber on Mobile Devices
Kyber, an IND-CCA-secure key encapsulation mechanism (KEM) based on the MLWE problem, has been shortlisted for the third round evaluation of the NIST Post-Quantum Cryptography Standardization. In this paper, we explored the optimizations of Kyber in high-performance processors from the ARM Cortex-A series, which are widely used in mainstream mobile phones. To improve the performance of Kyber, we utilized the powerful SIMD instruction set NEON in an ARMv8-A to parallelize the core modules of Kyber, i.e., modular reduction and NTT. Specifically, we specially designed the optimized implementation based on the characteristic of the NEON instruction set for the Barrett and Montgomery reduction algorithms. To make full use of the computing power of NEON instructions, we proposed a novel strategy for computing the 16-bit Barrett reduction without handling the 32-bit intermediate result. Our Barrett and Montgomery reduction showed 8.52 and 8.89 times faster than the reference implementation. As for NTT/INTT, we adopted the 2+5 layer merging strategy on an ARMv8-A to implement NTT/INTT after carefully analyzing the register occupancy of various layer merging techniques. Thanks to the selected layer merging strategy, our NTT and INTT achieved 11.89 and 13.45 times speedups compared with the reference implementation. Our optimized software achieved 1.77×, 1.85×, and 2.16× speedups for key generation, encapsulation, and decapsulation compared with Kyber's reference implementation.