{"title":"基于ntt的Saber算法高速多项式乘法加速器及RISC-V的矢量扩展","authors":"Honglin Kuang, Yifan Zhao, Jun Han","doi":"10.1109/APCCAS55924.2022.10090293","DOIUrl":null,"url":null,"abstract":"Saber is a module-learning with rounding-based post-quantum cryptography (PQC) scheme for key encapsulation mechanism (KEM). It is characterized by the use of power-of-two moduli, which makes all modulus reductions free in hardware. However, such a decision prevents the direct implementation of the asymptotically fastest number theoretic transform (NTT) for the time-consuming polynomial multiplication in Saber. To efficiently multiply polynomials, researches have been done using a schoolbook or Toom-Cook or Karatsuba algorithm. Though these approaches result in decent operating speed at moderate area cost, they are disadvantageous when considering expanding the system to support multiple PQC protocols. To enable NTT for Saber, we choose an appropriate prime and use the sign-magnitude format for computation. A concise and efficient vectorized NTT algorithm has been proposed, based on which we design a configurable vector NTT unit to perform NTT and other arithmetic operations. The accelerator is dedicatedly pipelined to achieve high speed and is driven by custom vector instruction extension of RISC-V. We implement the proposed architecture with vector lanes of 32 and 16 on Xilinx UltraScale+ ZCU111. Results show that our design can achieve up to $5\\mathrm{x}$ and $3\\mathrm{x}$ improvement in computation time and area-time-product (ATP) respectively for degree-256 polynomials multiplication, compared to the state-of-the-art Saber polynomial multiplier counterparts.","PeriodicalId":243739,"journal":{"name":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A High-Speed NTT-Based Polynomial Multiplication Accelerator with Vector Extension of RISC-V for Saber Algorithm\",\"authors\":\"Honglin Kuang, Yifan Zhao, Jun Han\",\"doi\":\"10.1109/APCCAS55924.2022.10090293\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Saber is a module-learning with rounding-based post-quantum cryptography (PQC) scheme for key encapsulation mechanism (KEM). It is characterized by the use of power-of-two moduli, which makes all modulus reductions free in hardware. However, such a decision prevents the direct implementation of the asymptotically fastest number theoretic transform (NTT) for the time-consuming polynomial multiplication in Saber. To efficiently multiply polynomials, researches have been done using a schoolbook or Toom-Cook or Karatsuba algorithm. Though these approaches result in decent operating speed at moderate area cost, they are disadvantageous when considering expanding the system to support multiple PQC protocols. To enable NTT for Saber, we choose an appropriate prime and use the sign-magnitude format for computation. A concise and efficient vectorized NTT algorithm has been proposed, based on which we design a configurable vector NTT unit to perform NTT and other arithmetic operations. The accelerator is dedicatedly pipelined to achieve high speed and is driven by custom vector instruction extension of RISC-V. We implement the proposed architecture with vector lanes of 32 and 16 on Xilinx UltraScale+ ZCU111. Results show that our design can achieve up to $5\\\\mathrm{x}$ and $3\\\\mathrm{x}$ improvement in computation time and area-time-product (ATP) respectively for degree-256 polynomials multiplication, compared to the state-of-the-art Saber polynomial multiplier counterparts.\",\"PeriodicalId\":243739,\"journal\":{\"name\":\"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-11-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APCCAS55924.2022.10090293\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APCCAS55924.2022.10090293","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A High-Speed NTT-Based Polynomial Multiplication Accelerator with Vector Extension of RISC-V for Saber Algorithm
Saber is a module-learning with rounding-based post-quantum cryptography (PQC) scheme for key encapsulation mechanism (KEM). It is characterized by the use of power-of-two moduli, which makes all modulus reductions free in hardware. However, such a decision prevents the direct implementation of the asymptotically fastest number theoretic transform (NTT) for the time-consuming polynomial multiplication in Saber. To efficiently multiply polynomials, researches have been done using a schoolbook or Toom-Cook or Karatsuba algorithm. Though these approaches result in decent operating speed at moderate area cost, they are disadvantageous when considering expanding the system to support multiple PQC protocols. To enable NTT for Saber, we choose an appropriate prime and use the sign-magnitude format for computation. A concise and efficient vectorized NTT algorithm has been proposed, based on which we design a configurable vector NTT unit to perform NTT and other arithmetic operations. The accelerator is dedicatedly pipelined to achieve high speed and is driven by custom vector instruction extension of RISC-V. We implement the proposed architecture with vector lanes of 32 and 16 on Xilinx UltraScale+ ZCU111. Results show that our design can achieve up to $5\mathrm{x}$ and $3\mathrm{x}$ improvement in computation time and area-time-product (ATP) respectively for degree-256 polynomials multiplication, compared to the state-of-the-art Saber polynomial multiplier counterparts.