Compact Instruction Set Extensions for Dilithium

IF 2.8 3区 计算机科学 Q2 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE
Lu Li, Qi Tian, Guofeng Qin, Shuaiyu Chen, Weijia Wang
{"title":"Compact Instruction Set Extensions for Dilithium","authors":"Lu Li, Qi Tian, Guofeng Qin, Shuaiyu Chen, Weijia Wang","doi":"10.1145/3643826","DOIUrl":null,"url":null,"abstract":"<p>Post-quantum cryptography is considered to provide security against both traditional and quantum computer attacks. Dilithium is a digital signature algorithm that derives its security from the challenge of finding short vectors in lattices. It has been selected as one of the standardizations in the NIST post-quantum cryptography project. Hardware-software co-design is a commonly adopted implementation strategy to address various implementation challenges, including limited resources, high performance, and flexibility requirements. In this study, we investigate using compact instruction set extensions (ISEs) for Dilithium, aiming to improve software efficiency with low hardware overheads. To begin with, we propose tightly coupled accelerators that are deeply integrated into the RISC-V processor. These accelerators target the most computationally demanding components in resource-constrained processors, such as polynomial generation, Number Theoretic Transform (NTT), and modular arithmetic. Next, we design a set of custom instructions that seamlessly integrate with the RISC-V base instruction formats, completing the accelerators in a compact manner. Subsequently, we implement our ISEs in a chip design for the Hummingbird E203 core and conduct performance benchmarks for Dilithium utilizing these ISEs. Additionally, we evaluate the resource consumption of the ISEs on FPGA and ASIC technologies. Compared to the reference software implementation on the RISC-V core, our co-design demonstrates a remarkable speedup factor ranging from 6.95 to 9.96. This significant improvement in performance is achieved by incorporating additional hardware resources, specifically, a \\(35\\% \\) increase in LUTs, a \\(14\\% \\) increase in FFs, 7 additional DSPs, and no additional RAM. Furthermore, compared to the state-of-the-art approach, our work achieves faster speed performance with a reduced circuit cost. Specifically, the usage of additional LUTs, FFs, and RAMs is reduced by \\(47.53\\% \\), \\(50.43\\% \\), and \\(100\\% \\), respectively. On ASIC technology, our approach demonstrates 12 412 cell counts. Our co-design provides a better trade-off implementation on speed performance and circuit overheads.</p>","PeriodicalId":50914,"journal":{"name":"ACM Transactions on Embedded Computing Systems","volume":"1 1","pages":""},"PeriodicalIF":2.8000,"publicationDate":"2024-02-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Embedded Computing Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3643826","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0

Abstract

Post-quantum cryptography is considered to provide security against both traditional and quantum computer attacks. Dilithium is a digital signature algorithm that derives its security from the challenge of finding short vectors in lattices. It has been selected as one of the standardizations in the NIST post-quantum cryptography project. Hardware-software co-design is a commonly adopted implementation strategy to address various implementation challenges, including limited resources, high performance, and flexibility requirements. In this study, we investigate using compact instruction set extensions (ISEs) for Dilithium, aiming to improve software efficiency with low hardware overheads. To begin with, we propose tightly coupled accelerators that are deeply integrated into the RISC-V processor. These accelerators target the most computationally demanding components in resource-constrained processors, such as polynomial generation, Number Theoretic Transform (NTT), and modular arithmetic. Next, we design a set of custom instructions that seamlessly integrate with the RISC-V base instruction formats, completing the accelerators in a compact manner. Subsequently, we implement our ISEs in a chip design for the Hummingbird E203 core and conduct performance benchmarks for Dilithium utilizing these ISEs. Additionally, we evaluate the resource consumption of the ISEs on FPGA and ASIC technologies. Compared to the reference software implementation on the RISC-V core, our co-design demonstrates a remarkable speedup factor ranging from 6.95 to 9.96. This significant improvement in performance is achieved by incorporating additional hardware resources, specifically, a \(35\% \) increase in LUTs, a \(14\% \) increase in FFs, 7 additional DSPs, and no additional RAM. Furthermore, compared to the state-of-the-art approach, our work achieves faster speed performance with a reduced circuit cost. Specifically, the usage of additional LUTs, FFs, and RAMs is reduced by \(47.53\% \), \(50.43\% \), and \(100\% \), respectively. On ASIC technology, our approach demonstrates 12 412 cell counts. Our co-design provides a better trade-off implementation on speed performance and circuit overheads.

Dilithium 紧凑型指令集扩展
后量子加密算法被认为能提供安全防护,既能抵御传统计算机攻击,也能抵御量子计算机攻击。Dilithium 是一种数字签名算法,其安全性来自于在网格中寻找短向量的挑战。它已被选为 NIST 后量子加密项目的标准化算法之一。硬件-软件协同设计是一种普遍采用的实现策略,以应对各种实现挑战,包括有限的资源、高性能和灵活性要求。在本研究中,我们研究了如何为 Dilithium 使用紧凑型指令集扩展(ISE),旨在以较低的硬件开销提高软件效率。首先,我们提出了与 RISC-V 处理器深度集成的紧密耦合加速器。这些加速器针对资源受限处理器中计算要求最高的组件,如多项式生成、数论变换(NTT)和模块化算术。接下来,我们设计了一套自定义指令,与 RISC-V 基本指令格式无缝集成,以紧凑的方式完成加速器。随后,我们在蜂鸟 E203 内核的芯片设计中实现了 ISE,并利用这些 ISE 对 Dilithium 进行了性能基准测试。此外,我们还评估了 ISE 在 FPGA 和 ASIC 技术上的资源消耗。与 RISC-V 内核上的参考软件实现相比,我们的协同设计实现了 6.95 到 9.96 的显著提速。性能的大幅提升是通过增加硬件资源实现的,具体来说,LUT增加了35%,FF增加了14%,增加了7个DSP,但没有增加RAM。此外,与最先进的方法相比,我们的工作实现了更快的速度性能,同时降低了电路成本。具体来说,额外的LUT、FF和RAM的使用分别减少了47.53%、50.43%和100%。在 ASIC 技术上,我们的方法展示了 12 412 个单元数。我们的协同设计在速度性能和电路开销之间实现了更好的权衡。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems 工程技术-计算机:软件工程
CiteScore
3.70
自引率
0.00%
发文量
138
审稿时长
6 months
期刊介绍: The design of embedded computing systems, both the software and hardware, increasingly relies on sophisticated algorithms, analytical models, and methodologies. ACM Transactions on Embedded Computing Systems (TECS) aims to present the leading work relating to the analysis, design, behavior, and experience with embedded computing systems.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信