Julian Büchel;Athanasios Vasilopoulos;Benedikt Kersting;Corey Lammie;Kevin Brew;Timothy Philip;Nicole Saulnier;Vijay Narayanan;Manuel Le Gallo;Abu Sebastian
{"title":"Programming Weights to Analog In-Memory Computing Cores by Direct Minimization of the Matrix-Vector Multiplication Error","authors":"Julian Büchel;Athanasios Vasilopoulos;Benedikt Kersting;Corey Lammie;Kevin Brew;Timothy Philip;Nicole Saulnier;Vijay Narayanan;Manuel Le Gallo;Abu Sebastian","doi":"10.1109/JETCAS.2023.3329449","DOIUrl":"10.1109/JETCAS.2023.3329449","url":null,"abstract":"Accurate programming of non-volatile memory (NVM) devices in analog in-memory computing (AIMC) cores is critical to achieve high matrix-vector multiplication (MVM) accuracy during deep learning inference workloads. In this paper, we propose a novel programming approach that directly minimizes the MVM error by performing stochastic gradient descent optimization with synthetic random input data. The MVM error is significantly reduced compared to the conventional unit-cell by unit-cell iterative programming. We demonstrate that the optimal hyperparameters in our method are agnostic to the weights being programmed, enabling large-scale deployment across multiple AIMC cores without further fine tuning. It also eliminates the need for high-resolution analog to digital converters (ADCs) to decipher the small unit-cell conductance during programming. We experimentally validate this approach by demonstrating an inference accuracy increase of 1.26% on ResNet-9. The experiments were performed using phase change memory (PCM)-based AIMC cores fabricated in 14nm CMOS technology.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"13 4","pages":"1052-1061"},"PeriodicalIF":4.6,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135362614","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic-HDC: A Two-Stage Dynamic Inference Framework for Brain-Inspired Hyperdimensional Computing","authors":"Yu-Chuan Chuang;Cheng-Yang Chang;An-Yeu Wu","doi":"10.1109/JETCAS.2023.3328857","DOIUrl":"10.1109/JETCAS.2023.3328857","url":null,"abstract":"Brain-inspired hyperdimensional computing (HDC) has attracted attention due to its energy efficiency and noise resilience in various IoT applications. However, striking the right balance between accuracy and efficiency in HDC remains a challenge. Specifically, HDC represents data as high-dimensional vectors known as hypervectors (HVs), where each component of HVs can be a high-precision integer or a low-cost bipolar number (+1/−1). However, this choice presents HDC with a significant trade-off between accuracy and efficiency. To address this challenge, we propose a two-stage dynamic inference framework called Dynamic-HDC that offers IoT applications a more flexible solution rather than limiting them to choose between the two extreme options. Dynamic-HDC leverages the strategies of early exit and model parameter adaptation. Unlike prior works that use a single HDC model to classify all data, Dynamic-HDC employs a cascade of models for two-stage inference. The first stage involves a low-cost, low-precision bipolar model, while the second stage utilizes a high-cost, high-precision integer model. By doing so, Dynamic-HDC can save computational resources for easy samples by performing an early exit when the low-cost bipolar model exhibits high confidence in its classification. For difficult samples, the high-precision integer model is conditionally activated to achieve more accurate predictions. To further enhance the efficiency of Dynamic-HDC, we introduce dynamic dimension selection (DDS) and dynamic class selection (DCS). These techniques enable the framework to dynamically adapt the dimensions and the number of classes in the HDC model, further optimizing performance. We evaluate the effectiveness of Dynamic-HDC on three commonly used benchmarks in HDC research, namely MNIST, ISOLET, and UCIHAR. Our simulation results demonstrate that Dynamic-HDC with different configurations can reduce energy consumption by 19.8-51.1% and execution time by 22.5-49.9% with negligible 0.02-0.36 % accuracy degradation compared to a single integer model. Compared to a single bipolar model, Dynamic-HDC improves 3.1% accuracy with a slight 10% energy and 14% execution time overhead.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"13 4","pages":"1125-1136"},"PeriodicalIF":4.6,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135263033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GEBA: Gradient-Error-Based Approximation of Activation Functions","authors":"Changmin Ye;Doo Seok Jeong","doi":"10.1109/JETCAS.2023.3328890","DOIUrl":"10.1109/JETCAS.2023.3328890","url":null,"abstract":"Computing-in-memory (CIM) macros aiming at accelerating deep learning operations at low power need activation function (AF) units on the same die to reduce their host-dependency. Versatile CIM macros need to include reconfigurable AF units at high precision and high efficiency in hardware usage. To this end, we propose the gradient-error-based approximation (GEBA) of AFs, which approximates various types of AFs in discrete input domains at high precision. GEBA reduces the approximation error by ca. 49.7%, 67.3%, 81.4%, 60.1% (for sigmoid, tanh, GELU, swish in FP32), compared with the uniform input-based approximation using the same memory as GEBA.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"13 4","pages":"1106-1113"},"PeriodicalIF":4.6,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135263024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maria J. Avedillo;Manuel Jiménez Través;Corentin Delacour;Aida Todri-Sanial;Bernabé Linares-Barranco;Juan Núñez
{"title":"Operating Coupled VO₂-Based Oscillators for Solving Ising Models","authors":"Maria J. Avedillo;Manuel Jiménez Través;Corentin Delacour;Aida Todri-Sanial;Bernabé Linares-Barranco;Juan Núñez","doi":"10.1109/JETCAS.2023.3328887","DOIUrl":"10.1109/JETCAS.2023.3328887","url":null,"abstract":"Coupled nano-oscillators are attracting increasing interest because of their potential to perform computation efficiently, enabling new applications in computing and information processing. The potential of phase transition devices for such dynamical systems has recently been recognized. This paper investigates the implementation of coupled VO2-based oscillator networks to solve combinatorial optimization problems. The target problem is mapped to an Ising model, which is solved by the synchronization dynamics of the system. Different factors that impact the probability of the system reaching the ground state of the Ising Hamiltonian and, therefore, the optimum solution to the corresponding optimization problem, are analyzed. The simulation-based analysis has led to the proposal of a novel Second-Harmonic Injection Locking (SHIL) schedule. Its main feature is that SHIL signal amplitude is repeatedly smoothly increased and decreased. Reducing SHIL strength is the mechanism that enables escaping from local minimum energy states. Our experiments show better results in terms of success probability than previously reported approaches. An experimental Oscillatory Ising Machine (OIM) has been built to validate our proposal.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"13 4","pages":"901-913"},"PeriodicalIF":4.6,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135263563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Vineeta V. Nair;Chithra Reghuvaran;Deepu John;Bhaskar Choubey;Alex James
{"title":"ESSM: Extended Synaptic Sampling Machine With Stochastic Echo State Neuro-Memristive Circuits","authors":"Vineeta V. Nair;Chithra Reghuvaran;Deepu John;Bhaskar Choubey;Alex James","doi":"10.1109/JETCAS.2023.3328875","DOIUrl":"10.1109/JETCAS.2023.3328875","url":null,"abstract":"Synaptic stochasticity is an important feature of biological neural networks that is not widely explored in analog memristor networks. Synaptic Sampling Machine (SSM) is one of the recent models of the neural network that explores the importance of the synaptic stochasticity. In this paper, we present a memristive Echo State Network (ESN) with Extended-SSM (ESSM). The circuit-level design of the single synaptic sampling cell that can introduce stochasticity to the neural network is presented. The architecture of synaptic sampling cells is proposed that have the ability to adaptively reprogram the arrays and respond to stimuli of various strengths. The effect of stochasticity is achieved by randomly blocking the input with the probability that follows Bernoulli distribution, and can lead to the reduction of the memory capacity requirements. The blocking signals are randomly generated using Circular Shift Registers (CSRs). The network processing is handled in analog domain and the training is performed offline. The performance of the neural network is analyzed with a view to benchmark for hardware performance without compromising the system performance. The neural system was tested on ECG, MNIST, Fashion MNIST and CIFAR10 dataset for classification problem. The advantage of memristive CSR in comparison with conventional CMOS based CSR is presented. The ESSM-ESN performance is evaluated with the effect of device variations like resistance variations, noise and quantization. The advantage of ESSM-ESN is demonstrated in terms of performance and power requirements in comparison with other neural architectures.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"13 4","pages":"965-974"},"PeriodicalIF":4.6,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10302278","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135263567","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Spike Timing Dependent Gradient for Direct Training of Fast and Efficient Binarized Spiking Neural Networks","authors":"Zhengyu Cai;Hamid Rahimian Kalatehbali;Ben Walters;Mostafa Rahimi Azghadi;Amirali Amirsoleimani;Roman Genov","doi":"10.1109/JETCAS.2023.3328926","DOIUrl":"10.1109/JETCAS.2023.3328926","url":null,"abstract":"Spiking neural networks (SNNs) are well-suited for neuromorphic hardware due to their biological plausibility and energy efficiency. These networks utilize sparse, asynchronous spikes for communication and can be binarized. However, the training of such networks presents several challenges due to their non-differentiable activation function and binarized inter-layer data movement. The well-established backpropagation through time (BPTT) algorithm used to train SNNs encounters notable difficulties because of its substantial memory consumption and extensive computational demands. These limitations restrict its practical utility in real-world scenarios. Therefore, effective techniques are required to train such networks efficiently while preserving accuracy. In this paper, we propose Binarized Spike Timing Dependent Gradient (BSTDG), a novel method that utilizes presynaptic and postsynaptic timings to bypass the non-differentiable gradient and the need of BPTT. Additionally, we employ binarized weights with a threshold training strategy to enhance energy savings and performance. Moreover, we exploit latency/temporal-based coding and the Integrate-and-Fire (IF) model to achieve significant computational advantages. We evaluate the proposed method on Caltech101 Face/Motorcycle, MNIST, Fashion-MNIST, and Spiking Heidelberg Digits. The results demonstrate that the accuracy attained surpasses that of existing BSNNs and single-spike networks under the same structure. Furthermore, the proposed model achieves up to 30\u0000<inline-formula> <tex-math>$times times times $ </tex-math></inline-formula>\u0000 speedup in inference and effectively reduces the number of spikes emitted in the hidden layer by 50% compared to previous works.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"13 4","pages":"1083-1093"},"PeriodicalIF":4.6,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135263786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CBP-QSNN: Spiking Neural Networks Quantized Using Constrained Backpropagation","authors":"Donghyung Yoo;Doo Seok Jeong","doi":"10.1109/JETCAS.2023.3328911","DOIUrl":"10.1109/JETCAS.2023.3328911","url":null,"abstract":"Spiking Neural Networks (SNNs) support sparse event-based data processing at high power efficiency when implemented in event-based neuromorphic processors. However, the limited on- chip memory capacity of neuromorphic processors strictly delimits the depth and width of SNNs implemented. A direct solution is the use of quantized SNNs (QSNNs) in place of SNNs with FP32 weights. To this end, we propose a method to quantize the weights using constrained backpropagation (CBP) with the Lagrangian function (conventional loss function plus well-defined weight-constraint functions) as an objective function. This work utilizes CBP as a post-training algorithm for deep SNNs pre-trained using various state-of-the-art methods including direct training (TSSL-BP, STBP, and surrogate gradient) and DNN-to-SNN conversion (SNN-Calibration), validating CBP as a general framework for QSNNs. CBP-QSNNs highlight their high accuracy insomuch as the degradation of accuracy on CIFAR-10, DVS128 Gesture, and CIFAR10-DVS in the worst case is less than 1%. Particularly, CBP-QSNNs for SNN-Calibration-pretrained SNNs on CIFAR-100 highlight an unexpected large increase in accuracy by 3.72% while using small weight-memory (3.5% of the FP32 case).","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"13 4","pages":"1137-1146"},"PeriodicalIF":4.6,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135263360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MiCE: An ANN-to-SNN Conversion Technique to Enable High Accuracy and Low Latency","authors":"Nguyen-Dong Ho;Ik-Joon Chang","doi":"10.1109/JETCAS.2023.3328863","DOIUrl":"10.1109/JETCAS.2023.3328863","url":null,"abstract":"Spiking Neural Networks (SNNs) mimic the behavior of biological neurons. Unlike traditional Artificial Neural Networks (ANNs) that operate in a continuous time domain and use activation functions to process information, SNNs operate discrete event-driven, where data is encoded and communicated through spikes or discrete events. This unique approach offers several advantages, such as efficient computation and lower power consumption, making SNNs particularly attractive for energy-constrained and neuromorphic applications. However, training SNNs poses significant challenges due to the discrete nature of spikes and the non-differentiable behavior they exhibit. As a result, converting pre-trained ANNs into SNNs has gained attention as a convenient approach. While this approach simplifies the training process, it introduces certain drawbacks, including high latency. The conversion of ANNs to SNNs typically leads to a loss of accuracy, which can be attributed to various factors, including quantization, clipping, and timing errors. Previous studies have proposed techniques to mitigate quantization and clipping errors during the conversion process. However, they do not consider timing errors, degrading SNN accuracies at low latency conditions. This work introduces the MiCE conversion method, which offers a comprehensive joint optimization strategy to simultaneously alleviate quantization, clipping, and timing errors. At a moderate latency of 8 time-steps, our converted ResNet-20 achieves classification accuracies of 79.02% and 95.74% on the CIFAR-100 and CIFAR-10 datasets, respectively.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"13 4","pages":"1094-1105"},"PeriodicalIF":4.6,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135263785","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
R. Madhuvanthi Srivatsav;Shantanu Chakrabartty;Chetan Singh Thakur
{"title":"Neuromorphic Computing With Address-Event-Representation Using Time-to-Event Margin Propagation","authors":"R. Madhuvanthi Srivatsav;Shantanu Chakrabartty;Chetan Singh Thakur","doi":"10.1109/JETCAS.2023.3328916","DOIUrl":"10.1109/JETCAS.2023.3328916","url":null,"abstract":"Address-Event-Representation (AER) is a spike-routing protocol that allows the scaling of neuromorphic and spiking neural network (SNN) architectures. However, in conventional neuromorphic architectures, the AER protocol and in general, any virtual interconnect plays only a passive role in computation, i.e., only for routing spikes and events. In this paper, we show how causal temporal primitives like delay, triggering, and sorting inherent in the AER protocol itself can be exploited for scalable neuromorphic computing using our proposed technique called Time-to-Event Margin Propagation (TEMP). The proposed TEMP-based AER architecture is fully asynchronous and relies on interconnect delays for memory and computing as opposed to conventional and local multiply-and-accumulate (MAC) operations. We show that the time-based encoding in the TEMP neural network produces a spatio-temporal representation that can encode a large number of discriminatory patterns. As a proof-of-concept, we show that a trained TEMP-based convolutional neural network (CNN) can demonstrate an accuracy greater than 99% on the MNIST dataset and 91.2% on the Fashion MNIST Dataset. Overall, our work is a biologically inspired computing paradigm that brings forth a new dimension of research to the field of neuromorphic computing.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"13 4","pages":"1114-1124"},"PeriodicalIF":4.6,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135263348","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Benchmarking DNN Mapping Methods for the in-Memory Computing Accelerators","authors":"Yimin Wang;Xuanyao Fong","doi":"10.1109/JETCAS.2023.3328864","DOIUrl":"10.1109/JETCAS.2023.3328864","url":null,"abstract":"This paper presents a study of methods for mapping the convolutional workloads in deep neural networks (DNNs) onto the computing hardware in the in-memory computing (IMC) architecture. Specifically, we focus on categorizing and benchmarking the processing element (PE)-level mapping methods, which have not been investigated in detail for IMC-based architectures. First, we categorize the PE-level mapping methods from the loop unrolling perspective and discuss the corresponding implications on input data reuse and output data reduction. Then, a mapping-oriented architecture is proposed by considering the input and output datapaths under various mapping methods. The architecture is evaluated on the 45 nm technology showing good area-efficiency and scalability, providing a hardware substrate for further performance improvements via PE-level mappings. Furthermore, we present an evaluation framework that captures the architecture behaviors and enables extensive benchmarking of mapping methods under various neural network workloads, main memory bandwidth, and digital computing throughput. The benchmarking results demonstrate significant tradeoffs in the design space and unlock new design possibilities. We present case studies to showcase preferred mapping methods for best energy consumption and/or execution time and demonstrate that a hybrid-mapping scheme enhances minimum execution time by up to 30% for the publicly-available DNN benchmarks.","PeriodicalId":48827,"journal":{"name":"IEEE Journal on Emerging and Selected Topics in Circuits and Systems","volume":"13 4","pages":"1040-1051"},"PeriodicalIF":4.6,"publicationDate":"2023-10-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135263023","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}