Hardware acceleration of number theoretic transform for zk‐SNARK

IF 1.8 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Engineering reports : open access Pub Date : 2023-02-16 DOI:10.1002/eng2.12639

H. Zhao, Dong Ding, Feng Wang, Pengcheng Hua, Ning Wang, Qi Wu, Zhilei Chai

{"title":"Hardware acceleration of number theoretic transform for zk‐SNARK","authors":"H. Zhao, Dong Ding, Feng Wang, Pengcheng Hua, Ning Wang, Qi Wu, Zhilei Chai","doi":"10.1002/eng2.12639","DOIUrl":null,"url":null,"abstract":"Zk-SNARK unleashes the great potential of ZKP (zero-knowledge proof) in the blockchain, distributed storage, etc. However, the proof-generation of zk-SNARK is excessively time intensive, making it a challenge to deploy a high-performance zk-SNARK in most real applications. As a result, NTT (Number Theoretic Transform), one of the most time-consuming parts in proofgeneration, needs to be accelerated significantly. To address this issue, we propose a novel and efficient “data reordering” technique to enable a highly pipelined architecture, on which an FPGA-based hardware accelerator is designed to support the large-bitwidth and large-scale NTT tasks in zk-SNARK. Our architecture achieves a two-level pipeline: 1) the top-level pipeline is achieved among smaller NTT sub-tasks, which are decomposed from a large-scale NTT task; 2) the bottom-level pipeline is achieved in each sub-task, among butterfly operations with different step sizes. This architecture can effectively reduce the data dependency and memory access requirements, meanwhile, can be flexibly scaled to different scales of FPGAs. To balance computing efficiency and flexibility, the OpenCL equipped with HLS is used to implement the heterogeneous acceleration system. We prototype the accelerator on the AMD-Xilinx Alveo U50 card (UltraScale+ XCU50 FPGA). The evaluation results show that 1) our accelerator shows high scalability for different scales of FPGAs with a stable performance improvement; 2) it performs 1.95× faster than the one in PipeZK; 3) and it achieves 27.98×, 1.74× speedup and 6.9×, 6× energy efficiency improvement than AMD Ryzen 9 5900X single core and 12 cores respectively when integrated into the well-known ZKP open-source project,","PeriodicalId":72922,"journal":{"name":"Engineering reports : open access","volume":" ","pages":""},"PeriodicalIF":1.8000,"publicationDate":"2023-02-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Engineering reports : open access","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/eng2.12639","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 2

Abstract

Zk-SNARK unleashes the great potential of ZKP (zero-knowledge proof) in the blockchain, distributed storage, etc. However, the proof-generation of zk-SNARK is excessively time intensive, making it a challenge to deploy a high-performance zk-SNARK in most real applications. As a result, NTT (Number Theoretic Transform), one of the most time-consuming parts in proofgeneration, needs to be accelerated significantly. To address this issue, we propose a novel and efficient “data reordering” technique to enable a highly pipelined architecture, on which an FPGA-based hardware accelerator is designed to support the large-bitwidth and large-scale NTT tasks in zk-SNARK. Our architecture achieves a two-level pipeline: 1) the top-level pipeline is achieved among smaller NTT sub-tasks, which are decomposed from a large-scale NTT task; 2) the bottom-level pipeline is achieved in each sub-task, among butterfly operations with different step sizes. This architecture can effectively reduce the data dependency and memory access requirements, meanwhile, can be flexibly scaled to different scales of FPGAs. To balance computing efficiency and flexibility, the OpenCL equipped with HLS is used to implement the heterogeneous acceleration system. We prototype the accelerator on the AMD-Xilinx Alveo U50 card (UltraScale+ XCU50 FPGA). The evaluation results show that 1) our accelerator shows high scalability for different scales of FPGAs with a stable performance improvement; 2) it performs 1.95× faster than the one in PipeZK; 3) and it achieves 27.98×, 1.74× speedup and 6.9×, 6× energy efficiency improvement than AMD Ryzen 9 5900X single core and 12 cores respectively when integrated into the well-known ZKP open-source project,

查看原文本刊更多论文

zk‐SNARK数论变换的硬件加速

Zk-SNARK在区块链、分布式存储等领域释放了ZKP（零知识证明）的巨大潜力。然而，Zk-SNALK的证明生成过于耗时，这使得在大多数实际应用中部署高性能的Zk-SNASK成为一个挑战。因此，NTT（数论变换）作为校对中最耗时的部分之一，需要大大加快。为了解决这个问题，我们提出了一种新颖高效的“数据重新排序”技术，以实现高度流水线结构，在此基础上设计了一个基于FPGA的硬件加速器，以支持zk-SNARK中的大位宽和大规模NTT任务。我们的架构实现了两级流水线：1）顶层流水线是在较小的NTT子任务之间实现的，这些子任务是从大规模NTT任务中分解出来的；2）底层流水线是在每个子任务中实现的，在不同步长的蝶形操作中。该架构可以有效降低数据依赖性和内存访问需求，同时可以灵活地扩展到不同规模的FPGA。为了平衡计算效率和灵活性，使用配备HLS的OpenCL来实现异构加速系统。我们在AMD Xilinx Alveo U50卡（UltraScale+XCU50 FPGA）上制作了加速器原型。评估结果表明：1）我们的加速器对不同规模的FPGA具有很高的可扩展性，性能得到了稳定的提高；2）它比PipeZK中的性能快1.95倍；3）与AMD Ryzen 9 5900X单核和12核相比，集成到著名的ZKP开源项目中，分别实现了27.98倍、1.74倍的加速和6.9倍、6倍的能效提升，

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊