A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT Applications

Zewen Ye, Ruibing Song, Hao Zhang, Donglong Chen, Ray C. C. Cheung, Kejie Huang
{"title":"A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT Applications","authors":"Zewen Ye, Ruibing Song, Hao Zhang, Donglong Chen, Ray C. C. Cheung, Kejie Huang","doi":"10.46586/tches.v2024.i2.130-153","DOIUrl":null,"url":null,"abstract":"Lattice-Based Cryptography (LBC) schemes, like CRYSTALS-Kyber and CRYSTALS-Dilithium, have been selected to be standardized in the NIST Post-Quantum Cryptography standard. However, implementing these schemes in resourceconstrained Internet-of-Things (IoT) devices is challenging, considering efficiency, power consumption, area overhead, and flexibility to support various operations and parameter settings. Some existing ASIC designs that prioritize lower power and area can not achieve optimal performance efficiency, which are not practical for battery-powered devices. Custom hardware accelerators in prior co-processor and processor designs have limited applications and flexibility, incurring significant area and power overheads for IoT devices. To address these challenges, this paper presents an efficient lattice-based cryptography processor with customized Single-Instruction-Multiple-Data (SIMD) instruction. First, our proposed SIMD architecture supports efficient parallel execution of various polynomial operations in 256-bit mode and acceleration of Keccak in 320-bit mode, both utilizing efficiently reused resources. Additionally, we introduce data shuffling hardware units to resolve data dependencies within SIMD data. To further enhance performance, we design a dual-issue path for memory accesses and corresponding software design methodologies to reduce the impact of data load/store blocking. Through a hardware/software co-design approach, our proposed processor achieves high efficiency in supporting all operations in lattice-based cryptography schemes. Evaluations of Kyber and Dilithium show our proposed processor achieves over 10x speedup compared with the baseline RISC-V processor and over 5x speedup versus ARM Cortex M4 implementations, making it a promising solution for securing IoT communications and storage. Moreover, Silicon synthesis results show our design can run at 200 MHz with 2.01 mW for Kyber KEM 512 and 2.13 mW for Dilithium 2, which outperforms state-of-the-art works in terms of PPAP (Performance x Power x Area).","PeriodicalId":321490,"journal":{"name":"IACR Transactions on Cryptographic Hardware and Embedded Systems","volume":"81 5","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IACR Transactions on Cryptographic Hardware and Embedded Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46586/tches.v2024.i2.130-153","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Lattice-Based Cryptography (LBC) schemes, like CRYSTALS-Kyber and CRYSTALS-Dilithium, have been selected to be standardized in the NIST Post-Quantum Cryptography standard. However, implementing these schemes in resourceconstrained Internet-of-Things (IoT) devices is challenging, considering efficiency, power consumption, area overhead, and flexibility to support various operations and parameter settings. Some existing ASIC designs that prioritize lower power and area can not achieve optimal performance efficiency, which are not practical for battery-powered devices. Custom hardware accelerators in prior co-processor and processor designs have limited applications and flexibility, incurring significant area and power overheads for IoT devices. To address these challenges, this paper presents an efficient lattice-based cryptography processor with customized Single-Instruction-Multiple-Data (SIMD) instruction. First, our proposed SIMD architecture supports efficient parallel execution of various polynomial operations in 256-bit mode and acceleration of Keccak in 320-bit mode, both utilizing efficiently reused resources. Additionally, we introduce data shuffling hardware units to resolve data dependencies within SIMD data. To further enhance performance, we design a dual-issue path for memory accesses and corresponding software design methodologies to reduce the impact of data load/store blocking. Through a hardware/software co-design approach, our proposed processor achieves high efficiency in supporting all operations in lattice-based cryptography schemes. Evaluations of Kyber and Dilithium show our proposed processor achieves over 10x speedup compared with the baseline RISC-V processor and over 5x speedup versus ARM Cortex M4 implementations, making it a promising solution for securing IoT communications and storage. Moreover, Silicon synthesis results show our design can run at 200 MHz with 2.01 mW for Kyber KEM 512 and 2.13 mW for Dilithium 2, which outperforms state-of-the-art works in terms of PPAP (Performance x Power x Area).
适用于物联网应用的基于晶格的高效后量子加密处理器
基于晶格的加密(LBC)方案,如 CRYSTALS-Kyber 和 CRYSTALS-Dilithium,已被选为 NIST 后量子加密标准的标准化方案。然而,考虑到效率、功耗、面积开销以及支持各种操作和参数设置的灵活性,在资源受限的物联网(IoT)设备中实施这些方案具有挑战性。一些优先考虑低功耗和低面积的现有 ASIC 设计无法实现最佳性能效率,这对于电池供电的设备来说并不实用。先前协处理器和处理器设计中的定制硬件加速器的应用和灵活性有限,给物联网设备带来了巨大的面积和功耗开销。为了应对这些挑战,本文提出了一种基于晶格的高效加密处理器,该处理器采用定制的单指令多数据(SIMD)指令。首先,我们提出的 SIMD 架构支持在 256 位模式下高效并行执行各种多项式运算,并在 320 位模式下加速 Keccak,两者都有效地利用了重复使用的资源。此外,我们还引入了数据洗牌硬件单元,以解决 SIMD 数据内部的数据依赖性问题。为了进一步提高性能,我们为内存访问设计了双问题路径和相应的软件设计方法,以减少数据加载/存储阻塞的影响。通过硬件/软件协同设计方法,我们提出的处理器可高效支持基于网格的加密方案中的所有操作。对 Kyber 和 Dilithium 的评估表明,与基线 RISC-V 处理器相比,我们提出的处理器的速度提高了 10 倍以上,与 ARM Cortex M4 实现相比,速度提高了 5 倍以上,这使其成为确保物联网通信和存储安全的理想解决方案。此外,硅综合结果表明,我们的设计可以在 200 MHz 的频率下运行,Kyber KEM 512 的功耗为 2.01 mW,Dilithium 2 的功耗为 2.13 mW,在 PPAP(性能 x 功耗 x 面积)方面优于最先进的作品。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信