IEEE Journal on Exploratory Solid-State Computational Devices and Circuits最新文献

筛选
英文 中文
Time-Based Compute-in-Memory for Cryogenic Neural Network With Successive Approximation Register Time-to-Digital Converter 逐次逼近寄存器时间-数字转换器低温神经网络中基于时间的内存计算
IF 2.4
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits Pub Date : 2022-11-29 DOI: 10.1109/JXCDC.2022.3225243
Dong Suk Kang;Shimeng Yu
{"title":"Time-Based Compute-in-Memory for Cryogenic Neural Network With Successive Approximation Register Time-to-Digital Converter","authors":"Dong Suk Kang;Shimeng Yu","doi":"10.1109/JXCDC.2022.3225243","DOIUrl":"10.1109/JXCDC.2022.3225243","url":null,"abstract":"This article explores a compute-in-memory (CIM) paradigm’s new application for cryogenic neural network. Using the 28-nm cryogenic transistor model calibrated at 4 K, the time-based CIM macro comprised of the following: 1) area-efficient unit delay cell design for cryogenic operation and 2) area and power efficient, and a high-resolution achievable successive approximation register (SAR) time-to-digital converter (TDC) is proposed. The benchmark simulation first shows that the proposed macro has better latency than the current-based CIM counterpart. Next, the simulation further shows that it has better scalability for a larger size decoder design and process technology optimization.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"128-133"},"PeriodicalIF":2.4,"publicationDate":"2022-11-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09966349.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42937846","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
IMAGIN: Library of IMPLY and MAGIC NOR-Based Approximate Adders for In-Memory Computing IMAGIN:用于内存计算的基于IMPLY和MAGIC NOR的近似加法器库
IF 2.4
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits Pub Date : 2022-11-14 DOI: 10.1109/JXCDC.2022.3222015
Chandan Kumar Jha;Phrangboklang Lyngton Thangkhiew;Kamalika Datta;Rolf Drechsler
{"title":"IMAGIN: Library of IMPLY and MAGIC NOR-Based Approximate Adders for In-Memory Computing","authors":"Chandan Kumar Jha;Phrangboklang Lyngton Thangkhiew;Kamalika Datta;Rolf Drechsler","doi":"10.1109/JXCDC.2022.3222015","DOIUrl":"10.1109/JXCDC.2022.3222015","url":null,"abstract":"In-memory computing (IMC) has attracted significant interest in recent years as it aims to bridge the memory bottleneck in the Von Neumann architectures. IMC also improves the energy efficiency in these architectures. Another technique that has been explored to reduce the energy consumption is the use of approximate circuits, targeted toward error resilient applications. These applications have addition as one of their most frequently used operations. In literature, CMOS-based approximate adder libraries have been implemented to help designers choose from a variety of designs depending on the output quality requirements. However, the same is not true for memristor-based approximate adders targeted for IMC architectures. Hence, in this work, we developed a framework to generate approximate adder designs with varying output errors for the 8-, 12-, and 16-bit adders. We implemented a state-of-the-art scheduling algorithm to obtain the best mapping of these approximate adder designs for IMC. We performed an exhaustive design space exploration to obtain the pareto-optimal approximate adder designs for various design and error metrics. We then proposed IMAGIN, a library of approximate adders compatible with the memristor-based IMC architecture, which are based on the IMPLY and MAGIC design styles. We also performed mean filtering on the Kodak image dataset using the approximate adders from the IMAGIN library. IMAGIN can help designers select from a wide variety of approximate adders depending on the output quality requirements and serve as benchmarks for future research in this direction. All pareto-optimal designs will be made available at \u0000<uri>https://github.com/agra-uni-bremen/JxCDC2022-imagin-add</uri>\u0000.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"68-76"},"PeriodicalIF":2.4,"publicationDate":"2022-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09950064.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46854745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 7
Stateful Logic Using Phase Change Memory 使用相变存储器的状态逻辑
IF 2.4
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits Pub Date : 2022-11-04 DOI: 10.1109/JXCDC.2022.3219731
Barak Hoffer;Nicolás Wainstein;Christopher M. Neumann;Eric Pop;Eilam Yalon;Shahar Kvatinsky
{"title":"Stateful Logic Using Phase Change Memory","authors":"Barak Hoffer;Nicolás Wainstein;Christopher M. Neumann;Eric Pop;Eilam Yalon;Shahar Kvatinsky","doi":"10.1109/JXCDC.2022.3219731","DOIUrl":"10.1109/JXCDC.2022.3219731","url":null,"abstract":"Stateful logic is a digital processing-in-memory (PIM) technique that could address von Neumann memory bottleneck challenges while maintaining backward compatibility with standard von Neumann architectures. In stateful logic, memory cells are used to perform the logic operations without reading or moving any data outside the memory array. Stateful logic has been previously demonstrated using several resistive memory types, mostly resistive RAM (RRAM). Here, we present a new method to design stateful logic using a different resistive memory-phase change memory (PCM). We propose and experimentally demonstrate four logic gate types (NOR, IMPLY, OR, NIMP) using commonly used PCM materials. Our stateful logic circuits are different than previously proposed circuits due to the different switching mechanisms and functionality of PCM compared to RRAM. Since the proposed stateful logic forms a functionally complete set, these gates enable the sequential execution of any logic function within the memory, paving the way to PCM-based digital PIM systems.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"77-83"},"PeriodicalIF":2.4,"publicationDate":"2022-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09938984.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43676180","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
CRUS: A Hardware-Efficient Algorithm Mitigating Highly Nonlinear Weight Update in CIM Crossbar Arrays for Artificial Neural Networks CRUS:一种硬件有效的人工神经网络CIM交叉条阵列中高度非线性权重更新算法
IF 2.4
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits Pub Date : 2022-11-04 DOI: 10.1109/JXCDC.2022.3220032
Junmo Lee;Joon Hwang;Youngwoon Cho;Min-Kyu Park;Woo Young Choi;Sangbum Kim;Jong-Ho Lee
{"title":"CRUS: A Hardware-Efficient Algorithm Mitigating Highly Nonlinear Weight Update in CIM Crossbar Arrays for Artificial Neural Networks","authors":"Junmo Lee;Joon Hwang;Youngwoon Cho;Min-Kyu Park;Woo Young Choi;Sangbum Kim;Jong-Ho Lee","doi":"10.1109/JXCDC.2022.3220032","DOIUrl":"10.1109/JXCDC.2022.3220032","url":null,"abstract":"Mitigating the nonlinear weight update of synaptic devices is one of the main challenges in designing compute-in-memory (CIM) crossbar arrays for artificial neural networks (ANNs). While various nonlinearity mitigation schemes have been proposed so far, only a few of them have dealt with high-weight update nonlinearity. This article presents a hardware-efficient on-chip weight update scheme named the conditional reverse update scheme (CRUS), which algorithmically mitigates highly nonlinear weight change in synaptic devices. For hardware efficiency, CRUS is implemented on-chip using low precision (1-bit) and infrequent circuit operations. To utilize algorithmic insights, the impact of the nonlinear weight update on training is investigated. We first introduce a metric called update noise (UN), which quantifies the deviation of the actual weight update in synaptic devices from the expected weight update calculated from the stochastic gradient descent (SGD) algorithm. Based on UN analysis, we aim to reduce AUN, the UN average over the entire training process. The key principle to reducing average UN (AUN) is to conditionally skip long-term depression (LTD) pulses during training. The trends of AUN and accuracy under various LTD skip conditions are investigated to find maximum accuracy conditions. By properly tuning LTD skip conditions, CRUS achieves >90% accuracy on the Modified National Institute of Standards and Technology (MNIST) dataset even under high-weight update nonlinearity. Furthermore, it shows better accuracy than previous nonlinearity mitigation techniques under similar hardware conditions. It also exhibits robustness to cycle-to-cycle variations (CCVs) in conductance updates. The results suggest that CRUS can be an effective solution to relieve the algorithm-hardware tradeoff in CIM crossbar array design.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"145-154"},"PeriodicalIF":2.4,"publicationDate":"2022-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09940271.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42642779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Memristive Devices for Time Domain Compute-in-Memory 内存中时域计算的记忆器件
IF 2.4
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits Pub Date : 2022-10-25 DOI: 10.1109/JXCDC.2022.3217098
Florian Freye;Jie Lou;Christopher Bengel;Stephan Menzel;Stefan Wiefels;Tobias Gemmeke
{"title":"Memristive Devices for Time Domain Compute-in-Memory","authors":"Florian Freye;Jie Lou;Christopher Bengel;Stephan Menzel;Stefan Wiefels;Tobias Gemmeke","doi":"10.1109/JXCDC.2022.3217098","DOIUrl":"10.1109/JXCDC.2022.3217098","url":null,"abstract":"Analog compute schemes and compute-in-memory (CIM) have emerged in an effort to reduce the increasing power hunger of convolutional neural networks (CNNs), which exceeds the constraints of edge devices. Memristive device types are a relatively new offering with interesting opportunities for unexplored circuit concepts. In this work, the use of memristive devices in cascaded time-domain CIM (TDCIM) is introduced with the primary goal of reducing the size of fully unrolled architectures. The different effects influencing the determinism in memristive devices are outlined together with reliability concerns. Architectures for binary as well as multibit multiply and accumulate (MAC) cells are presented and evaluated. As more involved circuits offer more accurate compute result, a tradeoff between design effort and accuracy comes into the picture. To further evaluate this tradeoff, the impact of variations on overall compute accuracy is discussed. The presented cells reach an energy/OP of 0.23 fJ at a size of \u0000<inline-formula> <tex-math>$1.2~{mu{ }}text{m}^{2}$ </tex-math></inline-formula>\u0000 for binary and 6.04 fJ at \u0000<inline-formula> <tex-math>$3.2~mu text{m}^{2}$ </tex-math></inline-formula>\u0000 for \u0000<inline-formula> <tex-math>$4times 4$ </tex-math></inline-formula>\u0000 bit MAC operations.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"119-127"},"PeriodicalIF":2.4,"publicationDate":"2022-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09930136.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44685222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Leveraging Ferroelectric Stochasticity and In-Memory Computing for DNN IP Obfuscation 利用铁电性和内存计算实现DNN IP混淆
IF 2.4
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits Pub Date : 2022-10-25 DOI: 10.1109/JXCDC.2022.3217043
Likhitha Mankali;Nikhil Rangarajan;Swetaki Chatterjee;Shubham Kumar;Yogesh Singh Chauhan;Ozgur Sinanoglu;Hussam Amrouch
{"title":"Leveraging Ferroelectric Stochasticity and In-Memory Computing for DNN IP Obfuscation","authors":"Likhitha Mankali;Nikhil Rangarajan;Swetaki Chatterjee;Shubham Kumar;Yogesh Singh Chauhan;Ozgur Sinanoglu;Hussam Amrouch","doi":"10.1109/JXCDC.2022.3217043","DOIUrl":"10.1109/JXCDC.2022.3217043","url":null,"abstract":"With the emergence of the Internet of Things (IoT), deep neural networks (DNNs) are widely used in different domains, such as computer vision, healthcare, social media, and defense. The hardware-level architecture of a DNN can be built using an in-memory computing-based design, which is loaded with the weights of a well-trained DNN model. However, such hardware-based DNN systems are vulnerable to model stealing attacks where an attacker reverse-engineers (REs) and extracts the weights of the DNN model. In this work, we propose an energy-efficient defense technique that combines a ferroelectric field effect transistor (FeFET)-based reconfigurable physically unclonable function (PUF) with an in-memory FeFET XNOR to thwart model stealing attacks. We leverage the inherent stochasticity in the FE domains to build a PUF that helps to corrupt the neural network’s (NN) weights when an adversarial attack is detected. We showcase the efficacy of the proposed defense scheme by performing experiments on graph-NNs (GNNs), a particular type of DNN. The proposed defense scheme is a first of its kind that evaluates the security of GNNs. We investigate the effect of corrupting the weights on different layers of the GNN on the accuracy degradation of the graph classification application for two specific error models of corrupting the FeFET-based PUFs and five different bioinformatics datasets. We demonstrate that our approach successfully degrades the inference accuracy of the graph classification by corrupting any layer of the GNN after a small rewrite pulse.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"102-110"},"PeriodicalIF":2.4,"publicationDate":"2022-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09930133.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43155261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
MR-PIPA: An Integrated Multilevel RRAM (HfOx)-Based Processing-In-Pixel Accelerator MR-PIPA:一种基于HfOx的集成多级RRAM处理像素加速器
IF 2.4
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits Pub Date : 2022-09-28 DOI: 10.1109/JXCDC.2022.3210509
Minhaz Abedin;Arman Roohi;Maximilian Liehr;Nathaniel Cady;Shaahin Angizi
{"title":"MR-PIPA: An Integrated Multilevel RRAM (HfOx)-Based Processing-In-Pixel Accelerator","authors":"Minhaz Abedin;Arman Roohi;Maximilian Liehr;Nathaniel Cady;Shaahin Angizi","doi":"10.1109/JXCDC.2022.3210509","DOIUrl":"10.1109/JXCDC.2022.3210509","url":null,"abstract":"This work paves the way to realize a processing-in-pixel (PIP) accelerator based on a multilevel HfOx resistive random access memory (RRAM) as a flexible, energy-efficient, and high-performance solution for real-time and smart image processing at edge devices. The proposed design intrinsically implements and supports a coarse-grained convolution operation in low-bit-width neural networks (NNs) leveraging a novel compute-pixel with nonvolatile weight storage at the sensor side. Our evaluations show that such a design can remarkably reduce the power consumption of data conversion and transmission to an off-chip processor maintaining accuracy compared with the recent in-sensor computing designs. Our proposed design, namely an integrated multilevel RRAM (HfOx)-based processing-in-pixel accelerator (MR-PIPA), achieves a frame rate of 1000 and efficiency of ~1.89 TOp/s/W, while it substantially reduces data conversion and transmission energy by ~84% compared to a baseline at the cost of minor accuracy degradation.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"59-67"},"PeriodicalIF":2.4,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09905572.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47970835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 10
Scalable 2T2R Logic Computation Structure: Design From Digital Logic Circuits to 3-D Stacked Memory Arrays 可扩展2T2R逻辑计算结构:从数字逻辑电路到三维堆叠存储器阵列的设计
IF 2.4
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits Pub Date : 2022-09-15 DOI: 10.1109/JXCDC.2022.3206778
Zongxian Yang;Kangqiang Pan;Norman Y. Zhou;Lan Wei
{"title":"Scalable 2T2R Logic Computation Structure: Design From Digital Logic Circuits to 3-D Stacked Memory Arrays","authors":"Zongxian Yang;Kangqiang Pan;Norman Y. Zhou;Lan Wei","doi":"10.1109/JXCDC.2022.3206778","DOIUrl":"10.1109/JXCDC.2022.3206778","url":null,"abstract":"In the post Moore era, post-complementary metal–oxide–semiconductor (CMOS) technologies have received intense interests for possible future digital logic applications beyond the CMOS scaling limits. In the meantime, from the system perspective, non-von Neumann architectures, such as processing-in-memory (PIM), are extensively explored to overcome the bottleneck of modern computers, known as the memory wall, for high-performance energy-efficient integrated circuits. In this article, we propose functionally complete nonvolatile logic gates based on a two-transistor-two-resistive random access memory (RRAM) (2T2R) unit structure, which is then used to form a reconfigurable three-transistor-two-RRAM (3T2R) chain with programmable interconnects for complex combinational logic circuits, and a dense 3-D stacked memory array architecture. The design has a highly regular and symmetric structure, while operations are flexible yet simple, without the need of complicated peripheral circuitry or a third resistive state. Implementations of XNOR gate and full adder using 3T2R chain without extra routing/control gates or resistors are shown as demonstration examples of arithmetic unit design. The proposed computing scheme is intrinsic, efficient with superior performance in speed and area. Easily integrated as 3-D stacked array, the proposed memory architecture not only serves as regular 3-D memory array but also performs logic computation within the same layer and between the stacked layers. Concurrent computations under multiple computation modes for flexible operations in the memory are presented. Bias schemes for selected/half-selected/unselected cells are also explained and verified.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"84-92"},"PeriodicalIF":2.4,"publicationDate":"2022-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09893161.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46809461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
An Energy Efficient Time-Multiplexing Computing-in-Memory Architecture for Edge Intelligence 一种面向边缘智能的节能时复用内存计算架构
IF 2.4
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits Pub Date : 2022-09-15 DOI: 10.1109/JXCDC.2022.3206879
Rui Xiao;Wenyu Jiang;Piew Yoong Chee
{"title":"An Energy Efficient Time-Multiplexing Computing-in-Memory Architecture for Edge Intelligence","authors":"Rui Xiao;Wenyu Jiang;Piew Yoong Chee","doi":"10.1109/JXCDC.2022.3206879","DOIUrl":"10.1109/JXCDC.2022.3206879","url":null,"abstract":"The growing data volume and complexity of deep neural networks (DNNs) require new architectures to surpass the limitation of the von-Neumann bottleneck, with computing-in-memory (CIM) as a promising direction for implementing energy-efficient neural networks. However, CIM’s peripheral sensing circuits are usually power- and area-hungry components. We propose a time-multiplexing CIM architecture (TM-CIM) based on memristive analog computing to share the peripheral circuits and process one column at a time. The memristor array is arranged in a column-wise manner that avoids wasting power/energy on unselected columns. In addition, digital-to-analog converter (DAC) power and energy efficiency, which turns out to be an even greater overhead than analog-to-digital converter (ADC), can be fine-tuned in TM-CIM for significant improvement. For a 256*256 crossbar array with a typical setting, TM-CIM saves \u0000<inline-formula> <tex-math>$18.4times $ </tex-math></inline-formula>\u0000 in energy with 0.136 pJ/MAC efficiency, and \u0000<inline-formula> <tex-math>$19.9times $ </tex-math></inline-formula>\u0000 area for 1T1R case and \u0000<inline-formula> <tex-math>$15.9times $ </tex-math></inline-formula>\u0000 for 2T2R case. Performance estimation on VGG-16 indicates that TM-CIM can save over \u0000<inline-formula> <tex-math>$16times $ </tex-math></inline-formula>\u0000 area. A tradeoff between the chip area, peak power, and latency is also presented, with a proposed scheme to further reduce the latency on VGG-16, without significantly increasing chip area and peak power.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"111-118"},"PeriodicalIF":2.4,"publicationDate":"2022-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09893208.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44009321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
RM-NTT: An RRAM-Based Compute-in-Memory Number Theoretic Transform Accelerator RM-NTT:一种基于ram的内存中计算数论转换加速器
IF 2.4
IEEE Journal on Exploratory Solid-State Computational Devices and Circuits Pub Date : 2022-08-30 DOI: 10.1109/JXCDC.2022.3202517
Yongmo Park;Ziyu Wang;Sangmin Yoo;Wei D. Lu
{"title":"RM-NTT: An RRAM-Based Compute-in-Memory Number Theoretic Transform Accelerator","authors":"Yongmo Park;Ziyu Wang;Sangmin Yoo;Wei D. Lu","doi":"10.1109/JXCDC.2022.3202517","DOIUrl":"10.1109/JXCDC.2022.3202517","url":null,"abstract":"As more cloud computing resources are used for machine learning training and inference processes, privacy-preserving techniques that protect data from revealing at the cloud platforms attract increasing interest. Homomorphic encryption (HE) is one of the most promising techniques that enable privacy-preserving machine learning because HE allows data to be evaluated under encrypted forms. However, deep neural network (DNN) implementations using HE are orders of magnitude slower than plaintext implementations. The use of very long polynomials and associated number theoretic transform (NTT) operations for polynomial multiplications is the main bottlenecks of HE implementation for practical uses. This article introduces RRAM number theoretic transform (RM-NTT): a resistive random access memory (RRAM)-based compute-in-memory (CIM) system to accelerate NTT and inverse NTT (INTT) operations. Instead of running fast Fourier transform (FFT)-like algorithms, RM-NTT uses a vector-matrix multiplication (VMM) approach to achieve maximal parallelism during NTT and INTT operations. To improve the efficiency, RM-NTT stores modified forms of the twiddle factors in the RRAM arrays to process NTT/INTT in the same RRAM array and employs a Montgomery reduction algorithm to convert the VMM results. The proposed optimization methods allow RM-NTT to significantly reduce NTT operation latency compared with other NTT accelerators, including both CIM and non-CIM-based designs. The effects of different RM-NTT design parameters and device nonidealities are also discussed.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"93-101"},"PeriodicalIF":2.4,"publicationDate":"2022-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09870678.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44468639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信