{"title":"Area-Time Efficient Hardware Implementation for Binary Ring-LWE Based Post-Quantum Cryptography","authors":"Shao-I Chu;Syuan-An Ke","doi":"10.1109/TETC.2024.3482324","DOIUrl":null,"url":null,"abstract":"Post-quantum cryptography (PQC) has recently gained intensive attention as the existing public-key cryptosystems are vulnerable to quantum attacks. The ring-learning-with-errors (RLWE)-based PQC is one promising type of the lattice-based schemes. A light variant, called binary RLWE (BRLWE), was developed with applications to Internet-of-Things (IoT) and edge computing. However, deploying the number theoretic transform (NTT) is not beneficial to the parameter settings of the BRLWE-based scheme. This article presents three high-speed architectures of decryption for the BRLWE-based scheme with low area-time complexity. The first one is modified and corrected from the low-latency design of the previous work. The second and third ones utilize the multiplexer-based design for multiplication and innovatively exploit the property of the skew-circulant matrix to reduce the computational latency. Moreover, the third one applies the Karatsuba algorithm to reduce the number of multiplications. However, the results demonstrate that it is not in favor of the design since the multiplication is involved in an integer and a binary number, not both integers. Let the lengths of the secret and public keys be <inline-formula><tex-math>$n$</tex-math></inline-formula> and <inline-formula><tex-math>$n\\log _{2}q$</tex-math></inline-formula> bits. The synthesized results reveal that the second and third architectures are superior to the lookup table (LUT)-based and linear-feedback shift register (LFSR)-based designs in the previous works in terms of area-time complexity. The FPGA implementation results indicate the second design outperforms the Karatsuba and Toeplitz matrix vector product (TMVP)-initiated accelerators in the literatures by reductions of 62.4% and 51.7% in area-time complexity for the case of <inline-formula><tex-math>$(n, q) = (256, 256)$</tex-math></inline-formula>. As <inline-formula><tex-math>$(n,q)=(512,256)$</tex-math></inline-formula>, the improvements are 44.3% and 28.3%. The third architecture is also superior to these high-speed designs. The proposed implementations are efficient in area-time complexity and are suitable for high-performance applications.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 3","pages":"724-738"},"PeriodicalIF":5.4000,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10733832/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Post-quantum cryptography (PQC) has recently gained intensive attention as the existing public-key cryptosystems are vulnerable to quantum attacks. The ring-learning-with-errors (RLWE)-based PQC is one promising type of the lattice-based schemes. A light variant, called binary RLWE (BRLWE), was developed with applications to Internet-of-Things (IoT) and edge computing. However, deploying the number theoretic transform (NTT) is not beneficial to the parameter settings of the BRLWE-based scheme. This article presents three high-speed architectures of decryption for the BRLWE-based scheme with low area-time complexity. The first one is modified and corrected from the low-latency design of the previous work. The second and third ones utilize the multiplexer-based design for multiplication and innovatively exploit the property of the skew-circulant matrix to reduce the computational latency. Moreover, the third one applies the Karatsuba algorithm to reduce the number of multiplications. However, the results demonstrate that it is not in favor of the design since the multiplication is involved in an integer and a binary number, not both integers. Let the lengths of the secret and public keys be $n$ and $n\log _{2}q$ bits. The synthesized results reveal that the second and third architectures are superior to the lookup table (LUT)-based and linear-feedback shift register (LFSR)-based designs in the previous works in terms of area-time complexity. The FPGA implementation results indicate the second design outperforms the Karatsuba and Toeplitz matrix vector product (TMVP)-initiated accelerators in the literatures by reductions of 62.4% and 51.7% in area-time complexity for the case of $(n, q) = (256, 256)$. As $(n,q)=(512,256)$, the improvements are 44.3% and 28.3%. The third architecture is also superior to these high-speed designs. The proposed implementations are efficient in area-time complexity and are suitable for high-performance applications.
期刊介绍:
IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.