Junmo Lee;Joon Hwang;Youngwoon Cho;Min-Kyu Park;Woo Young Choi;Sangbum Kim;Jong-Ho Lee
{"title":"CRUS: A Hardware-Efficient Algorithm Mitigating Highly Nonlinear Weight Update in CIM Crossbar Arrays for Artificial Neural Networks","authors":"Junmo Lee;Joon Hwang;Youngwoon Cho;Min-Kyu Park;Woo Young Choi;Sangbum Kim;Jong-Ho Lee","doi":"10.1109/JXCDC.2022.3220032","DOIUrl":"10.1109/JXCDC.2022.3220032","url":null,"abstract":"Mitigating the nonlinear weight update of synaptic devices is one of the main challenges in designing compute-in-memory (CIM) crossbar arrays for artificial neural networks (ANNs). While various nonlinearity mitigation schemes have been proposed so far, only a few of them have dealt with high-weight update nonlinearity. This article presents a hardware-efficient on-chip weight update scheme named the conditional reverse update scheme (CRUS), which algorithmically mitigates highly nonlinear weight change in synaptic devices. For hardware efficiency, CRUS is implemented on-chip using low precision (1-bit) and infrequent circuit operations. To utilize algorithmic insights, the impact of the nonlinear weight update on training is investigated. We first introduce a metric called update noise (UN), which quantifies the deviation of the actual weight update in synaptic devices from the expected weight update calculated from the stochastic gradient descent (SGD) algorithm. Based on UN analysis, we aim to reduce AUN, the UN average over the entire training process. The key principle to reducing average UN (AUN) is to conditionally skip long-term depression (LTD) pulses during training. The trends of AUN and accuracy under various LTD skip conditions are investigated to find maximum accuracy conditions. By properly tuning LTD skip conditions, CRUS achieves >90% accuracy on the Modified National Institute of Standards and Technology (MNIST) dataset even under high-weight update nonlinearity. Furthermore, it shows better accuracy than previous nonlinearity mitigation techniques under similar hardware conditions. It also exhibits robustness to cycle-to-cycle variations (CCVs) in conductance updates. The results suggest that CRUS can be an effective solution to relieve the algorithm-hardware tradeoff in CIM crossbar array design.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"145-154"},"PeriodicalIF":2.4,"publicationDate":"2022-11-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09940271.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42642779","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Memristive Devices for Time Domain Compute-in-Memory","authors":"Florian Freye;Jie Lou;Christopher Bengel;Stephan Menzel;Stefan Wiefels;Tobias Gemmeke","doi":"10.1109/JXCDC.2022.3217098","DOIUrl":"10.1109/JXCDC.2022.3217098","url":null,"abstract":"Analog compute schemes and compute-in-memory (CIM) have emerged in an effort to reduce the increasing power hunger of convolutional neural networks (CNNs), which exceeds the constraints of edge devices. Memristive device types are a relatively new offering with interesting opportunities for unexplored circuit concepts. In this work, the use of memristive devices in cascaded time-domain CIM (TDCIM) is introduced with the primary goal of reducing the size of fully unrolled architectures. The different effects influencing the determinism in memristive devices are outlined together with reliability concerns. Architectures for binary as well as multibit multiply and accumulate (MAC) cells are presented and evaluated. As more involved circuits offer more accurate compute result, a tradeoff between design effort and accuracy comes into the picture. To further evaluate this tradeoff, the impact of variations on overall compute accuracy is discussed. The presented cells reach an energy/OP of 0.23 fJ at a size of \u0000<inline-formula> <tex-math>$1.2~{mu{ }}text{m}^{2}$ </tex-math></inline-formula>\u0000 for binary and 6.04 fJ at \u0000<inline-formula> <tex-math>$3.2~mu text{m}^{2}$ </tex-math></inline-formula>\u0000 for \u0000<inline-formula> <tex-math>$4times 4$ </tex-math></inline-formula>\u0000 bit MAC operations.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"119-127"},"PeriodicalIF":2.4,"publicationDate":"2022-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09930136.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44685222","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Leveraging Ferroelectric Stochasticity and In-Memory Computing for DNN IP Obfuscation","authors":"Likhitha Mankali;Nikhil Rangarajan;Swetaki Chatterjee;Shubham Kumar;Yogesh Singh Chauhan;Ozgur Sinanoglu;Hussam Amrouch","doi":"10.1109/JXCDC.2022.3217043","DOIUrl":"10.1109/JXCDC.2022.3217043","url":null,"abstract":"With the emergence of the Internet of Things (IoT), deep neural networks (DNNs) are widely used in different domains, such as computer vision, healthcare, social media, and defense. The hardware-level architecture of a DNN can be built using an in-memory computing-based design, which is loaded with the weights of a well-trained DNN model. However, such hardware-based DNN systems are vulnerable to model stealing attacks where an attacker reverse-engineers (REs) and extracts the weights of the DNN model. In this work, we propose an energy-efficient defense technique that combines a ferroelectric field effect transistor (FeFET)-based reconfigurable physically unclonable function (PUF) with an in-memory FeFET XNOR to thwart model stealing attacks. We leverage the inherent stochasticity in the FE domains to build a PUF that helps to corrupt the neural network’s (NN) weights when an adversarial attack is detected. We showcase the efficacy of the proposed defense scheme by performing experiments on graph-NNs (GNNs), a particular type of DNN. The proposed defense scheme is a first of its kind that evaluates the security of GNNs. We investigate the effect of corrupting the weights on different layers of the GNN on the accuracy degradation of the graph classification application for two specific error models of corrupting the FeFET-based PUFs and five different bioinformatics datasets. We demonstrate that our approach successfully degrades the inference accuracy of the graph classification by corrupting any layer of the GNN after a small rewrite pulse.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"102-110"},"PeriodicalIF":2.4,"publicationDate":"2022-10-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09930133.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"43155261","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MR-PIPA: An Integrated Multilevel RRAM (HfOx)-Based Processing-In-Pixel Accelerator","authors":"Minhaz Abedin;Arman Roohi;Maximilian Liehr;Nathaniel Cady;Shaahin Angizi","doi":"10.1109/JXCDC.2022.3210509","DOIUrl":"10.1109/JXCDC.2022.3210509","url":null,"abstract":"This work paves the way to realize a processing-in-pixel (PIP) accelerator based on a multilevel HfOx resistive random access memory (RRAM) as a flexible, energy-efficient, and high-performance solution for real-time and smart image processing at edge devices. The proposed design intrinsically implements and supports a coarse-grained convolution operation in low-bit-width neural networks (NNs) leveraging a novel compute-pixel with nonvolatile weight storage at the sensor side. Our evaluations show that such a design can remarkably reduce the power consumption of data conversion and transmission to an off-chip processor maintaining accuracy compared with the recent in-sensor computing designs. Our proposed design, namely an integrated multilevel RRAM (HfOx)-based processing-in-pixel accelerator (MR-PIPA), achieves a frame rate of 1000 and efficiency of ~1.89 TOp/s/W, while it substantially reduces data conversion and transmission energy by ~84% compared to a baseline at the cost of minor accuracy degradation.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"59-67"},"PeriodicalIF":2.4,"publicationDate":"2022-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09905572.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47970835","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zongxian Yang;Kangqiang Pan;Norman Y. Zhou;Lan Wei
{"title":"Scalable 2T2R Logic Computation Structure: Design From Digital Logic Circuits to 3-D Stacked Memory Arrays","authors":"Zongxian Yang;Kangqiang Pan;Norman Y. Zhou;Lan Wei","doi":"10.1109/JXCDC.2022.3206778","DOIUrl":"10.1109/JXCDC.2022.3206778","url":null,"abstract":"In the post Moore era, post-complementary metal–oxide–semiconductor (CMOS) technologies have received intense interests for possible future digital logic applications beyond the CMOS scaling limits. In the meantime, from the system perspective, non-von Neumann architectures, such as processing-in-memory (PIM), are extensively explored to overcome the bottleneck of modern computers, known as the memory wall, for high-performance energy-efficient integrated circuits. In this article, we propose functionally complete nonvolatile logic gates based on a two-transistor-two-resistive random access memory (RRAM) (2T2R) unit structure, which is then used to form a reconfigurable three-transistor-two-RRAM (3T2R) chain with programmable interconnects for complex combinational logic circuits, and a dense 3-D stacked memory array architecture. The design has a highly regular and symmetric structure, while operations are flexible yet simple, without the need of complicated peripheral circuitry or a third resistive state. Implementations of XNOR gate and full adder using 3T2R chain without extra routing/control gates or resistors are shown as demonstration examples of arithmetic unit design. The proposed computing scheme is intrinsic, efficient with superior performance in speed and area. Easily integrated as 3-D stacked array, the proposed memory architecture not only serves as regular 3-D memory array but also performs logic computation within the same layer and between the stacked layers. Concurrent computations under multiple computation modes for flexible operations in the memory are presented. Bias schemes for selected/half-selected/unselected cells are also explained and verified.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"84-92"},"PeriodicalIF":2.4,"publicationDate":"2022-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09893161.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46809461","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An Energy Efficient Time-Multiplexing Computing-in-Memory Architecture for Edge Intelligence","authors":"Rui Xiao;Wenyu Jiang;Piew Yoong Chee","doi":"10.1109/JXCDC.2022.3206879","DOIUrl":"10.1109/JXCDC.2022.3206879","url":null,"abstract":"The growing data volume and complexity of deep neural networks (DNNs) require new architectures to surpass the limitation of the von-Neumann bottleneck, with computing-in-memory (CIM) as a promising direction for implementing energy-efficient neural networks. However, CIM’s peripheral sensing circuits are usually power- and area-hungry components. We propose a time-multiplexing CIM architecture (TM-CIM) based on memristive analog computing to share the peripheral circuits and process one column at a time. The memristor array is arranged in a column-wise manner that avoids wasting power/energy on unselected columns. In addition, digital-to-analog converter (DAC) power and energy efficiency, which turns out to be an even greater overhead than analog-to-digital converter (ADC), can be fine-tuned in TM-CIM for significant improvement. For a 256*256 crossbar array with a typical setting, TM-CIM saves \u0000<inline-formula> <tex-math>$18.4times $ </tex-math></inline-formula>\u0000 in energy with 0.136 pJ/MAC efficiency, and \u0000<inline-formula> <tex-math>$19.9times $ </tex-math></inline-formula>\u0000 area for 1T1R case and \u0000<inline-formula> <tex-math>$15.9times $ </tex-math></inline-formula>\u0000 for 2T2R case. Performance estimation on VGG-16 indicates that TM-CIM can save over \u0000<inline-formula> <tex-math>$16times $ </tex-math></inline-formula>\u0000 area. A tradeoff between the chip area, peak power, and latency is also presented, with a proposed scheme to further reduce the latency on VGG-16, without significantly increasing chip area and peak power.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"111-118"},"PeriodicalIF":2.4,"publicationDate":"2022-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09893208.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44009321","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"RM-NTT: An RRAM-Based Compute-in-Memory Number Theoretic Transform Accelerator","authors":"Yongmo Park;Ziyu Wang;Sangmin Yoo;Wei D. Lu","doi":"10.1109/JXCDC.2022.3202517","DOIUrl":"10.1109/JXCDC.2022.3202517","url":null,"abstract":"As more cloud computing resources are used for machine learning training and inference processes, privacy-preserving techniques that protect data from revealing at the cloud platforms attract increasing interest. Homomorphic encryption (HE) is one of the most promising techniques that enable privacy-preserving machine learning because HE allows data to be evaluated under encrypted forms. However, deep neural network (DNN) implementations using HE are orders of magnitude slower than plaintext implementations. The use of very long polynomials and associated number theoretic transform (NTT) operations for polynomial multiplications is the main bottlenecks of HE implementation for practical uses. This article introduces RRAM number theoretic transform (RM-NTT): a resistive random access memory (RRAM)-based compute-in-memory (CIM) system to accelerate NTT and inverse NTT (INTT) operations. Instead of running fast Fourier transform (FFT)-like algorithms, RM-NTT uses a vector-matrix multiplication (VMM) approach to achieve maximal parallelism during NTT and INTT operations. To improve the efficiency, RM-NTT stores modified forms of the twiddle factors in the RRAM arrays to process NTT/INTT in the same RRAM array and employs a Montgomery reduction algorithm to convert the VMM results. The proposed optimization methods allow RM-NTT to significantly reduce NTT operation latency compared with other NTT accelerators, including both CIM and non-CIM-based designs. The effects of different RM-NTT design parameters and device nonidealities are also discussed.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 2","pages":"93-101"},"PeriodicalIF":2.4,"publicationDate":"2022-08-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9969523/09870678.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44468639","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Siri Narla;Piyush Kumar;Ann Franchesca Laguna;Dayane Reis;X. Sharon Hu;Michael Niemier;Azad Naeemi
{"title":"Modeling and Design for Magnetoelectric Ternary Content Addressable Memory (TCAM)","authors":"Siri Narla;Piyush Kumar;Ann Franchesca Laguna;Dayane Reis;X. Sharon Hu;Michael Niemier;Azad Naeemi","doi":"10.1109/JXCDC.2022.3181925","DOIUrl":"https://doi.org/10.1109/JXCDC.2022.3181925","url":null,"abstract":"This article proposes a novel magnetoelectric (ME) effect-based ternary content addressable memory (TCAM). The potential array-level write and search performances of the proposed ME-TCAM are studied using experimentally calibrated compact physical models and SPICE simulations. The voltage-controlled operation of the ME devices eliminates the large joule heating present in the current-controlled magnetic devices and their low-voltage write operation makes them more energy-efficient compared to static random access memory-based TCAMs (SRAM-TCAMs). The proposed compact TCAM outperforms its SRAM counterpart with \u0000<inline-formula> <tex-math>$1.35times $ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$14.4times $ </tex-math></inline-formula>\u0000 improvements in search and write energy, respectively, and its nonvolatility eliminates the standby leakage. We project an error rate below \u0000<inline-formula> <tex-math>$10^{-4}$ </tex-math></inline-formula>\u0000 while considering various sources of variation in magnetic and CMOS devices. At the application level, using memory-augmented neural networks (MANNs), we project a \u0000<inline-formula> <tex-math>$2times $ </tex-math></inline-formula>\u0000 energy-delay–area-product (EDAP) improvement over an SRAM-TCAM.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 1","pages":"44-52"},"PeriodicalIF":2.4,"publicationDate":"2022-06-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9903013/09792464.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49963534","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Yttrium Iron Garnet-Based Combinatorial Logic and Memory Devices","authors":"Michael Balinskiy;Alexander Khitun","doi":"10.1109/JXCDC.2022.3202180","DOIUrl":"10.1109/JXCDC.2022.3202180","url":null,"abstract":"Yttrium iron garnet Y3Fe2(FeO4)3 (YIG) has a uniquely low magnetic damping for spin waves, which makes it a perfect material for magnonic devices. Spin waves typically exist in the microwave frequency range, and their wavelength can be decreased to the nanoscale. Their dispersion in YIG waveguides depends on the strength and orientation of the bias magnetic field. It may be possible to exploit YIG waveguides as field-controlled filters and delay lines. In this work, we describe combinatorial logic and memory devices to benefit YIG properties. An act of computation in the combinatorial device is associated with finding a route connecting the input and output ports. We present experimental data demonstrating the pathfinding in the active ring circuit with YIG waveguide. The ability to search in parallel through multiple paths is the most appealing property of combinatorial devices. Potentially, they may compete with quantum computers in functional throughput.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 1","pages":"53-58"},"PeriodicalIF":2.4,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9903013/09868767.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42869408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits—Vol. 8, No. 1","authors":"Azad Naeemi","doi":"10.1109/JXCDC.2022.3204198","DOIUrl":"10.1109/JXCDC.2022.3204198","url":null,"abstract":"Welcome to the seventh volume, second semiannual issue of the IEEE Journal on Exploratory Solid-State Computational Devices and Circuits (JXCDC), a multidisciplinary, open access IEEE journal that is focused on publishing seminal research in the exploration for energy-efficient computing based on physics and materials to enable new devices, circuits, and architecture that will be of great interest to integrated circuit researchers and those working in the information technology (IT) industry. The articles in the journal are selectively chosen to provide insight into the architectural, circuit, and device implications of emerging quantum nanoelectronic and nanomagnetic device technologies. Discovery of new materials, devices, and circuits for energy-efficient computational circuits will be needed to enable Moore’s law to continue for computing beyond the end of the roadmap for CMOS technologies, with significant improvement in energy efficiency and cost per function.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"8 1","pages":"ii-iii"},"PeriodicalIF":2.4,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/6570653/9684158/09903016.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44882687","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}