Runqing Xu, Debiao He, Min Luo, Cong Peng, Xiangyong Zeng
{"title":"Optimizing Dilithium Implementation with AVX2/-512","authors":"Runqing Xu, Debiao He, Min Luo, Cong Peng, Xiangyong Zeng","doi":"10.1145/3687309","DOIUrl":null,"url":null,"abstract":"Dilithium is a signature scheme that is currently being standardized to the Module-Lattice-Based Digital Signature Standard by NIST. It is believed to be secure even against attacks from large-scale quantum computers based on lattice problems. The implementation efficiency is important for promoting the migration of current cryptography algorithms to post-quantum cryptography algorithms. In this paper, we optimize the implementation of Dilithium with several new approaches proposed. Firstly, we improve the efficiency of parallel NTT implementations. The overhead of shuffling operations is reduced in our implementations, and fewer loading instructions are invoked for the precomputations. Then, we optimize the sampling and bit-packing of polynomial coefficients in Dilithium. We can handle double the number of coefficients within one register using a new approach for the sampling of secret key polynomials. The approaches proposed in this paper are applicable to implementations under AVX2 and AVX-512 instruction sets. Take Dilithium2 as an illustration, our AVX2 implementation demonstrates improvements of 22.7%, 16.9%, and 13.5% for KeyGen, Sign, and Verify compared to the previous implementation.","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":null,"pages":null},"PeriodicalIF":2.8000,"publicationDate":"2024-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Embedded Computing Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3687309","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
Dilithium is a signature scheme that is currently being standardized to the Module-Lattice-Based Digital Signature Standard by NIST. It is believed to be secure even against attacks from large-scale quantum computers based on lattice problems. The implementation efficiency is important for promoting the migration of current cryptography algorithms to post-quantum cryptography algorithms. In this paper, we optimize the implementation of Dilithium with several new approaches proposed. Firstly, we improve the efficiency of parallel NTT implementations. The overhead of shuffling operations is reduced in our implementations, and fewer loading instructions are invoked for the precomputations. Then, we optimize the sampling and bit-packing of polynomial coefficients in Dilithium. We can handle double the number of coefficients within one register using a new approach for the sampling of secret key polynomials. The approaches proposed in this paper are applicable to implementations under AVX2 and AVX-512 instruction sets. Take Dilithium2 as an illustration, our AVX2 implementation demonstrates improvements of 22.7%, 16.9%, and 13.5% for KeyGen, Sign, and Verify compared to the previous implementation.
期刊介绍:
The design of embedded computing systems, both the software and hardware, increasingly relies on sophisticated algorithms, analytical models, and methodologies. ACM Transactions on Embedded Computing Systems (TECS) aims to present the leading work relating to the analysis, design, behavior, and experience with embedded computing systems.