RM-NTT:一种基于ram的内存中计算数论转换加速器

IF 2 Q3 COMPUTER SCIENCE, HARDWARE & ARCHITECTURE

IEEE Journal on Exploratory Solid-State Computational Devices and Circuits Pub Date : 2022-08-30 DOI:10.1109/JXCDC.2022.3202517

Yongmo Park;Ziyu Wang;Sangmin Yoo;Wei D. Lu

{"title":"RM-NTT:一种基于ram的内存中计算数论转换加速器","authors":"Yongmo Park;Ziyu Wang;Sangmin Yoo;Wei D. Lu","doi":"10.1109/JXCDC.2022.3202517","DOIUrl":null,"url":null,"abstract":"As more cloud computing resources are used for machine learning training and inference processes, privacy-preserving techniques that protect data from revealing at the cloud platforms attract increasing interest. Homomorphic encryption (HE) is one of the most promising techniques that enable privacy-preserving machine learning because HE allows data to be evaluated under encrypted forms. However, deep neural network (DNN) implementations using HE are orders of magnitude slower than plaintext implementations. The use of very long polynomials and associated number theoretic transform (NTT) operations for polynomial multiplications is the main bottlenecks of HE implementation for practical uses. This article introduces RRAM number theoretic transform (RM-NTT): a resistive random access memory (RRAM)-based compute-in-memory (CIM) system to accelerate NTT and inverse NTT (INTT) operations. Instead of running fast Fourier transform (FFT)-like algorithms, RM-NTT uses a vector-matrix multiplication (VMM) approach to achieve maximal parallelism during NTT and INTT operations. To improve the efficiency, RM-NTT stores modified forms of the twiddle factors in the RRAM arrays to process NTT/INTT in the same RRAM array and employs a Montgomery reduction algorithm to convert the VMM results. The proposed optimization methods allow RM-NTT to significantly reduce NTT operation latency compared with other NTT accelerators, including both CIM and non-CIM-based designs. The effects of different RM-NTT design parameters and device nonidealities are also discussed.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"93-101"},"PeriodicalIF":2.0000,"publicationDate":"2022-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09870678.pdf","citationCount":"3","resultStr":"{\"title\":\"RM-NTT: An RRAM-Based Compute-in-Memory Number Theoretic Transform Accelerator\",\"authors\":\"Yongmo Park;Ziyu Wang;Sangmin Yoo;Wei D. Lu\",\"doi\":\"10.1109/JXCDC.2022.3202517\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"As more cloud computing resources are used for machine learning training and inference processes, privacy-preserving techniques that protect data from revealing at the cloud platforms attract increasing interest. Homomorphic encryption (HE) is one of the most promising techniques that enable privacy-preserving machine learning because HE allows data to be evaluated under encrypted forms. However, deep neural network (DNN) implementations using HE are orders of magnitude slower than plaintext implementations. The use of very long polynomials and associated number theoretic transform (NTT) operations for polynomial multiplications is the main bottlenecks of HE implementation for practical uses. This article introduces RRAM number theoretic transform (RM-NTT): a resistive random access memory (RRAM)-based compute-in-memory (CIM) system to accelerate NTT and inverse NTT (INTT) operations. Instead of running fast Fourier transform (FFT)-like algorithms, RM-NTT uses a vector-matrix multiplication (VMM) approach to achieve maximal parallelism during NTT and INTT operations. To improve the efficiency, RM-NTT stores modified forms of the twiddle factors in the RRAM arrays to process NTT/INTT in the same RRAM array and employs a Montgomery reduction algorithm to convert the VMM results. The proposed optimization methods allow RM-NTT to significantly reduce NTT operation latency compared with other NTT accelerators, including both CIM and non-CIM-based designs. The effects of different RM-NTT design parameters and device nonidealities are also discussed.\",\"PeriodicalId\":54149,\"journal\":{\"name\":\"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits\",\"volume\":\"8 2\",\"pages\":\"93-101\"},\"PeriodicalIF\":2.0000,\"publicationDate\":\"2022-08-30\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://ieeexplore.ieee.org/iel7/6570653/9969523/09870678.pdf\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/9870678/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/9870678/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}

引用次数: 3

摘要

随着越来越多的云计算资源用于机器学习训练和推理过程，保护数据在云平台上不被泄露的隐私保护技术吸引了越来越多的兴趣。同态加密(HE)是实现保护隐私的机器学习的最有前途的技术之一，因为HE允许在加密形式下对数据进行评估。然而，使用HE的深度神经网络(DNN)实现比明文实现要慢几个数量级。使用超长多项式和相关的数论变换(NTT)运算进行多项式乘法是实际应用中HE实现的主要瓶颈。本文介绍了RRAM数论变换(RM-NTT):一种基于电阻式随机存取存储器(RRAM)的内存中计算(CIM)系统，可以加速NTT和逆NTT运算。RM-NTT不是运行快速的傅里叶变换(FFT)算法，而是使用向量矩阵乘法(VMM)方法在NTT和INTT操作期间实现最大的并行性。为了提高效率，RM-NTT在RRAM数组中存储修改后的旋转因子形式，以便在同一RRAM数组中处理NTT/INTT，并使用Montgomery约简算法转换VMM结果。与其他NTT加速器(包括基于CIM和非CIM的设计)相比，所提出的优化方法允许RM-NTT显著降低NTT操作延迟。讨论了不同的RM-NTT设计参数和器件非理想性的影响。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

RM-NTT: An RRAM-Based Compute-in-Memory Number Theoretic Transform Accelerator

As more cloud computing resources are used for machine learning training and inference processes, privacy-preserving techniques that protect data from revealing at the cloud platforms attract increasing interest. Homomorphic encryption (HE) is one of the most promising techniques that enable privacy-preserving machine learning because HE allows data to be evaluated under encrypted forms. However, deep neural network (DNN) implementations using HE are orders of magnitude slower than plaintext implementations. The use of very long polynomials and associated number theoretic transform (NTT) operations for polynomial multiplications is the main bottlenecks of HE implementation for practical uses. This article introduces RRAM number theoretic transform (RM-NTT): a resistive random access memory (RRAM)-based compute-in-memory (CIM) system to accelerate NTT and inverse NTT (INTT) operations. Instead of running fast Fourier transform (FFT)-like algorithms, RM-NTT uses a vector-matrix multiplication (VMM) approach to achieve maximal parallelism during NTT and INTT operations. To improve the efficiency, RM-NTT stores modified forms of the twiddle factors in the RRAM arrays to process NTT/INTT in the same RRAM array and employs a Montgomery reduction algorithm to convert the VMM results. The proposed optimization methods allow RM-NTT to significantly reduce NTT operation latency compared with other NTT accelerators, including both CIM and non-CIM-based designs. The effects of different RM-NTT design parameters and device nonidealities are also discussed.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Journal on Exploratory Solid-State Computational Devices and Circuits COMPUTER SCIENCE, HARDWARE & ARCHITECTURE-

CiteScore

5.00

自引率

4.20%

发文量

审稿时长

13 weeks