Changxu Liu;Hao Zhou;Lan Yang;Zheng Wu;Patrick Dai;Yinlong Li;Shiyong Wu;Fan Yang
{"title":"Myosotis:一种基于数据共享的高效管道化参数化多标量乘法架构","authors":"Changxu Liu;Hao Zhou;Lan Yang;Zheng Wu;Patrick Dai;Yinlong Li;Shiyong Wu;Fan Yang","doi":"10.1109/TCAD.2024.3524364","DOIUrl":null,"url":null,"abstract":"Zero-knowledge proof (ZKP) is a widely used privacy-preserving technology, where multiscalar multiplication (MSM) accounts for over 70% of the computational workload. The acceleration of MSM can enhance the overall performance of ZKP, making it a focal point of community attention. However, in practical applications involving the deployment of multiple MSM accelerators, existing designs often overlook strategies for optimizing bandwidth and area efficiency. To address this, we propose Myosotis, an efficiently pipelined and parameterized MSM architecture. By sharing input data and allocating cache effectively, it mitigates average transmission bandwidth in runtime. Myosotis also supports the use of multiple point addition (PADD) units to achieve performance gains, balancing area overhead and latency for improved area efficiency. Different parameter selection enables a tradeoff between the performance, area, and bandwidth of the MSM accelerator. When benchmarking with MSM degrees between <inline-formula> <tex-math>$2^{18}$ </tex-math></inline-formula> and <inline-formula> <tex-math>$2^{26}$ </tex-math></inline-formula>, our proposed baseline design achieves up to <inline-formula> <tex-math>$3.32\\times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$6.72\\times $ </tex-math></inline-formula> speedups over state-of-the-art FPGA and ASIC designs. Compared to the baseline, Myosotis with two window MSMs and one PADD unit reduces bandwidth demand by 43% while maintaining similar area and latency. On the other hand, Myosotis with three window MSMs and two PADD units decreases latency by 43% and bandwidth by 17%, with only a 9% area increase.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 7","pages":"2738-2750"},"PeriodicalIF":2.9000,"publicationDate":"2024-12-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Myosotis: An Efficiently Pipelined and Parameterized Multiscalar Multiplication Architecture via Data Sharing\",\"authors\":\"Changxu Liu;Hao Zhou;Lan Yang;Zheng Wu;Patrick Dai;Yinlong Li;Shiyong Wu;Fan Yang\",\"doi\":\"10.1109/TCAD.2024.3524364\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Zero-knowledge proof (ZKP) is a widely used privacy-preserving technology, where multiscalar multiplication (MSM) accounts for over 70% of the computational workload. The acceleration of MSM can enhance the overall performance of ZKP, making it a focal point of community attention. However, in practical applications involving the deployment of multiple MSM accelerators, existing designs often overlook strategies for optimizing bandwidth and area efficiency. To address this, we propose Myosotis, an efficiently pipelined and parameterized MSM architecture. By sharing input data and allocating cache effectively, it mitigates average transmission bandwidth in runtime. Myosotis also supports the use of multiple point addition (PADD) units to achieve performance gains, balancing area overhead and latency for improved area efficiency. Different parameter selection enables a tradeoff between the performance, area, and bandwidth of the MSM accelerator. When benchmarking with MSM degrees between <inline-formula> <tex-math>$2^{18}$ </tex-math></inline-formula> and <inline-formula> <tex-math>$2^{26}$ </tex-math></inline-formula>, our proposed baseline design achieves up to <inline-formula> <tex-math>$3.32\\\\times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$6.72\\\\times $ </tex-math></inline-formula> speedups over state-of-the-art FPGA and ASIC designs. Compared to the baseline, Myosotis with two window MSMs and one PADD unit reduces bandwidth demand by 43% while maintaining similar area and latency. On the other hand, Myosotis with three window MSMs and two PADD units decreases latency by 43% and bandwidth by 17%, with only a 9% area increase.\",\"PeriodicalId\":13251,\"journal\":{\"name\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"volume\":\"44 7\",\"pages\":\"2738-2750\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2024-12-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10818748/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10818748/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
Myosotis: An Efficiently Pipelined and Parameterized Multiscalar Multiplication Architecture via Data Sharing
Zero-knowledge proof (ZKP) is a widely used privacy-preserving technology, where multiscalar multiplication (MSM) accounts for over 70% of the computational workload. The acceleration of MSM can enhance the overall performance of ZKP, making it a focal point of community attention. However, in practical applications involving the deployment of multiple MSM accelerators, existing designs often overlook strategies for optimizing bandwidth and area efficiency. To address this, we propose Myosotis, an efficiently pipelined and parameterized MSM architecture. By sharing input data and allocating cache effectively, it mitigates average transmission bandwidth in runtime. Myosotis also supports the use of multiple point addition (PADD) units to achieve performance gains, balancing area overhead and latency for improved area efficiency. Different parameter selection enables a tradeoff between the performance, area, and bandwidth of the MSM accelerator. When benchmarking with MSM degrees between $2^{18}$ and $2^{26}$ , our proposed baseline design achieves up to $3.32\times $ and $6.72\times $ speedups over state-of-the-art FPGA and ASIC designs. Compared to the baseline, Myosotis with two window MSMs and one PADD unit reduces bandwidth demand by 43% while maintaining similar area and latency. On the other hand, Myosotis with three window MSMs and two PADD units decreases latency by 43% and bandwidth by 17%, with only a 9% area increase.
期刊介绍:
The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.