A High-Speed Architecture for the Reduction in VDF Based on a Class Group

2020 IEEE 33rd International System-on-Chip Conference (SOCC) Pub Date : 2020-09-08 DOI:10.1109/socc49529.2020.9524783

Yifeng Song, Danyang Zhu, Jing Tian, Zhongfeng Wang

{"title":"A High-Speed Architecture for the Reduction in VDF Based on a Class Group","authors":"Yifeng Song, Danyang Zhu, Jing Tian, Zhongfeng Wang","doi":"10.1109/socc49529.2020.9524783","DOIUrl":null,"url":null,"abstract":"Due to the enormous energy consuming involved in the proof of work (POW) process, the resource-efficient blockchain system is urged to be released. The verifiable delay function (VDF), being slow to compute and easy to verify, is believed to be the kernel function of the next-generation blockchain system. In general, the reduction over a class group, involving many complex operations, such as the large-number division and multiplication operations, takes a large portion in the VDF. In this paper, for the first time, we propose a highspeed architecture for the reduction by incorporating algorithmic transformations and architectural optimizations. Firstly, based on the fastest reduction algorithm, we present a modified version to make it more hardware-friendly by introducing a novel transformation method that can efficiently remove the large-number divisions. Secondly, highly parallelized and pipelined architectures are devised respectively for the large-number multiplication and addition operations to reduce the latency and the critical path. Thirdly, a compact state machine is developed to enable maximum overlapping in time for computations. The experiment results show that when computing 209715 reduction steps with the input width of 2048 bits, the proposed design only takes 137.652ms running on an Altera Stratix-10 FPGA at 100MHz frequency, while the original algorithm needs 3278ms when operating over an i7-6850K CPU at 3.6GHz frequency. Thus we have obtained a drastic speedup of nearly 24x over an advanced CPU.","PeriodicalId":114740,"journal":{"name":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE 33rd International System-on-Chip Conference (SOCC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/socc49529.2020.9524783","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Due to the enormous energy consuming involved in the proof of work (POW) process, the resource-efficient blockchain system is urged to be released. The verifiable delay function (VDF), being slow to compute and easy to verify, is believed to be the kernel function of the next-generation blockchain system. In general, the reduction over a class group, involving many complex operations, such as the large-number division and multiplication operations, takes a large portion in the VDF. In this paper, for the first time, we propose a highspeed architecture for the reduction by incorporating algorithmic transformations and architectural optimizations. Firstly, based on the fastest reduction algorithm, we present a modified version to make it more hardware-friendly by introducing a novel transformation method that can efficiently remove the large-number divisions. Secondly, highly parallelized and pipelined architectures are devised respectively for the large-number multiplication and addition operations to reduce the latency and the critical path. Thirdly, a compact state machine is developed to enable maximum overlapping in time for computations. The experiment results show that when computing 209715 reduction steps with the input width of 2048 bits, the proposed design only takes 137.652ms running on an Altera Stratix-10 FPGA at 100MHz frequency, while the original algorithm needs 3278ms when operating over an i7-6850K CPU at 3.6GHz frequency. Thus we have obtained a drastic speedup of nearly 24x over an advanced CPU.

查看原文本刊更多论文

一种基于类群的VDF高速约简体系结构

由于工作量证明(POW)过程涉及巨大的能源消耗，资源节能型区块链系统迫切需要发布。可验证延迟函数(VDF)计算速度慢，易于验证，被认为是下一代区块链系统的核心功能。通常，类群上的约简涉及许多复杂操作，如大数除法和乘法操作，在VDF中占很大一部分。在本文中，我们首次提出了一种结合算法转换和架构优化的高速约简架构。首先，我们在现有最快约简算法的基础上，提出了一种改进的算法，通过引入一种新的变换方法，可以有效地去除大数除法，使其更加硬件友好。其次，针对大量的乘法运算和加法运算，分别设计了高度并行化和流水线化的架构，以减少延迟和关键路径;第三，开发了一种紧凑的状态机，使计算在时间上实现最大的重叠。实验结果表明，在输入宽度为2048位的情况下，本设计在Altera Stratix-10 FPGA上运行100MHz频率时，计算209715个约简步长仅需137.652ms，而原算法在i7-6850K CPU上运行3.6GHz频率时需要3278ms。因此，我们获得了比高级CPU快近24倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 IEEE 33rd International System-on-Chip Conference (SOCC)

自引率

0.00%

发文量