{"title":"Accuracy Improvement With Weight Mapping Strategy and Output Transformation for STT-MRAM-Based Computing-in-Memory","authors":"Xianggao Wang;Na Wei;Shifan Gao;Wenhao Wu;Yi Zhao","doi":"10.1109/JXCDC.2024.3478360","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3478360","url":null,"abstract":"This work presents an analog computing-in-memory (CiM) macro with spin-transfer torque magnetic random access memory (STT-MRAM) and 28-nm CMOS technology. The adopted CiM bitcell uses a differential scheme and balances the input resistance to minimize the nonideal factors during multiply-accumulate (MAC) operations. Specialized peripheral circuits were designed for the current-scheme CiM architecture. More importantly, strategies of accuracy improvement were innovatively proposed as follows: 1) mapping most significant bit (MSB) to the far side of the MRAM array and 2) output linear transformation based on the reference columns. Circuit-level simulation verified the functionality and performance improvement of the CiM macro based on the MNIST and CIFAR-10 datasets, realizing a 3% and 5% accuracy loss compared with the benchmark, respectively. The 640-GOPS (8 bit) throughput, 34.6-TOPS/mm2 area compactness, and 83.3-TOPS/W energy efficiency demonstrate the advantages of STT-MRAM CiM in the coming AI era.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"75-81"},"PeriodicalIF":2.0,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10714384","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward Fine-Grained Partitioning of Low-Level SRAM Caches for Emerging 3D-IC Designs","authors":"Sudipta Das;Bhawana Kumari;Siva Satyendra Sahoo;Yukai Chen;James Myers;Dragomir Milojevic;Dwaipayan Biswas;Julien Ryckaert","doi":"10.1109/JXCDC.2024.3468386","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3468386","url":null,"abstract":"Scaling on-chip memory capacity is one of the primary approaches to mitigate memory wall bottlenecks. Various 2.5-D/3-D integration schemes, leveraging novel partitioning, are being actively explored to improve system performance. However, fine-grained functional partitioning of memory macros is not widely reported. As static RAM (SRAM) scaling stagnates with emerging CMOS logic roadmap, we propose a partitioning of low-level (faster access) caches in 3-D using an array under CMOS (AuC) technology paradigm. Our study focuses on partitioning and optimization of SRAM bit-cells and peripheral circuits, enabling heterogeneous integration, achieving up to 12% higher operating frequency with 50% leakage power reduction in the memory macros. Applied on a 64-bit mobile system-on-chip (SoC) CPU core, we achieve up to 60% higher energy efficiency compared with 2-D baseline and 14% increase in operating frequency compared with standard memory-on-logic 3-D partitioning scheme.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"67-74"},"PeriodicalIF":2.0,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10695147","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142440855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Harrison Liew;Farhana Sheikh;Jong-Ru Guo;Zuoguo Wu;Borivoje Nikolić
{"title":"A Chisel Generator for Standardized 3-D Die-to-Die Interconnects","authors":"Harrison Liew;Farhana Sheikh;Jong-Ru Guo;Zuoguo Wu;Borivoje Nikolić","doi":"10.1109/JXCDC.2024.3461471","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3461471","url":null,"abstract":"A 3-D heterogeneous integration (3-D-HI) is poised to enable a new era of high-performance integrated circuits via a multitude of benefits, including a reduction in I/O power consumption and ability to tightly couple disparate technologies. However, a significant hurdle toward enabling a chiplet ecosystem is the standardization of 3-D die-to-die (D2D) interconnects that facilitate rapid integration. Technology-driven constraints highlighted in published works demonstrate that a unique approach to 3-D D2D interconnect design and implementation is required, while preserving the ability to customize the interconnect to accommodate future technology concerns and applications with minimal overhead. This article presents a framework to generate customized 3-D D2D interconnect physical layers (PHYs) that are simultaneously standard-compliant, physical-aware, and can be automatically integrated into all stacked chiplets. The generator framework leverages the Chisel hardware description language to allow designers to do the following: 1) compile a port list directly into a PHY; 2) automate design and physical design (PD); and 3) perform design space exploration of interconnect features (e.g., bump map pitch, clocking architecture, and others). The 3-D PHY generator framework and features detailed in this work can be used to produce a reference implementation for a standard like UCIe-3-D, representing a significant paradigm shift from current specification and design methodologies for 2.5-D D2D interconnect (e.g., UCIe) implementations. This work concludes with the results of a redundancy design space exploration tradeoff study, showing the benefits of a proposed spatial coding redundancy scheme in an example PHY using emulated 9-\u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000m hybrid bonding for a 4 Tx/4 Rx module array with 4:1 coding redundancy ratio.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"58-66"},"PeriodicalIF":2.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10681023","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142408973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CMOS Single-Photon Avalanche Diode Circuits for Probabilistic Computing","authors":"William Whitehead;Wonsik Oh;Luke Theogarajan","doi":"10.1109/JXCDC.2024.3452030","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3452030","url":null,"abstract":"Intrinsically random hardware devices are increasingly attracting attention for their potential use in probabilistic computing architectures. One such device is the single-photon avalanche diode (SPAD) and an associated functional unit, the variable-rate SPAD circuit (VRSC), recently proposed by us as a source of randomness for sampling and annealing Ising and Potts models. This work develops a more advanced understanding of these VRSCs by introducing several VRSC design options and studying their tradeoffs as implemented in a 65-nm CMOS process. Each VRSC is composed of a SPAD and a processing circuit. Combinations of three different SPAD designs and three different types of processing circuits were evaluated on several metrics such as area, speed, and variability. Measured results from the SPAD design space show that even extremely small SPADs are suitable for probabilistic computing purposes, and that high dark count rates are not detrimental either, so SPADs for probabilistic computing are actually easier to integrate in standard CMOS processes. Results from the circuit design space show that the time-to-analog-based designs introduced in this work can produce highly exponential and analytical transfer functions, but that the less analytically tractable output of the previously proposed filter-based designs can achieve less variability in a smaller footprint. Probabilistic bits (P-bits) composed of the fabricated VRSCs achieve bit flip rates of 50 MHz and allow at least one order of magnitude of control over their simulated annealing temperature.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"49-57"},"PeriodicalIF":2.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10659028","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142246389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Monolithic 3-D-Based Nonvolatile Associative Processor for High-Performance Energy-Efficient Computations","authors":"Esteban Garzón;Alessandro Bedoya;Marco Lanuzza;Leonid Yavits","doi":"10.1109/JXCDC.2024.3450810","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3450810","url":null,"abstract":"This article presents a monolithic 3-D associative in-memory processor (M3D AP) that combines emerging nonvolatile (NV) magnetic tunnel junction (MTJ) technology with massively parallel associative in-memory processing and M3D integration. The proposed architecture features two monolithic layers, with CMOS logic in the first layer and an MTJ-based content-addressable memory (CAM) array in the second layer. We conduct a thorough analysis of the electrical characteristics of the MTJ-based AP and use analysis results to evaluate the performance and power consumption of the M3D AP. We build a custom cycle-accurate simulator to implement and evaluate the 3-D associative matrix multiplication algorithm, highlighting the potential of the M3D AP for accelerating artificial intelligence applications. Overall, we demonstrate the efficacy of M3D AP and show that it holds promise for high-performance and energy-efficient in-memory computing.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"40-48"},"PeriodicalIF":2.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10649641","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142174008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MEFET-Based CAM/TCAM for Memory-Augmented Neural Networks","authors":"Sai Sanjeet;Jonathan Bird;Bibhu Datta Sahoo","doi":"10.1109/JXCDC.2024.3410681","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3410681","url":null,"abstract":"Memory-augmented neural networks (MANNs) require large external memories to enable long-term memory storage and retrieval. Content-addressable memory (CAM) is a type of memory used for high-speed searching applications and is well-suited for MANNs. Recent advances in exploratory nonvolatile devices have spurred the development of nonvolatile CAMs. However, these devices suffer from poor ON-OFF ratio, large write voltages, and long write times. This work proposes a nonvolatile ternary CAM (TCAM) using magnetoelectric field effect transistors (MEFETs). The energy and delay of various operations are simulated using the ASAP 7-nm predictive technology for the transistors and a Verilog-A model of the MEFET. The proposed structure achieves orders of magnitude improvement in search energy and \u0000<inline-formula> <tex-math>$gt 45times $ </tex-math></inline-formula>\u0000 improvement in search energy-delay product compared with prior works. The write energy and delay are also improved by \u0000<inline-formula> <tex-math>$8times $ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$12times $ </tex-math></inline-formula>\u0000, respectively, compared with CAMs designed with other nonvolatile devices. A variability analysis is performed to study the effect of process variations on the CAM. The proposed CAM is then used to build a one-shot learning MANN and is benchmarked with the Modified National Institute of Standards and Technology (MNIST), extended MNIST (EMNIST), and labeled faces in the wild (LFW) datasets with binary embeddings, giving >99% accuracy on MNIST, a top-3 accuracy of 97.11% on the EMNIST dataset, and >97% accuracy on the LFW dataset, with embedding sizes of 16, 64, and 512, respectively. The proposed CAM is shown to be fast, energy-efficient, and scalable, making it suitable for MANNs.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"31-39"},"PeriodicalIF":2.0,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10550938","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141439499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy-Accuracy Trade-Offs for Resistive In-Memory Computing Architectures","authors":"Saion K. Roy;Naresh R. Shanbhag","doi":"10.1109/JXCDC.2024.3381888","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3381888","url":null,"abstract":"Resistive in-memory computing (IMC) architectures currently lag behind SRAM IMCs and digital accelerators in both energy efficiency and compute density due to their low compute accuracy. This article proposes the use of signal-to-noise-plus-distortion ratio (SNDR) to quantify the compute accuracy of IMCs and identify the device, circuit, and architectural parameters that affect it. We further analyze the fundamental limits on the SNDR of magnetoresistive random access memory (MRAM-), resistive random access memory (ReRAM-), and ferroelectric field effect transistor (FeFET)-based IMCs employing parameter variation and noise models that were validated against measured results from a recent MRAM-based IMC prototype in a 22 nm process. At high-output signal magnitude, we can find that the maximum achievable SNDR is limited by the pre-analog-to-digital-converter (ADC) array nonidealities, such as the conductance variations (CVs), parasitic resistances, and current mirror mismatch (MM), whereas the ADC thermal (AT) noise limits the SNDR at small signal magnitudes. Furthermore, for large dot-product (DP) dimensions (\u0000<inline-formula> <tex-math>$N > 50$ </tex-math></inline-formula>\u0000), the maximum achievable SNDR is highest for FeFET, followed by ReRAM and then MRAM. Finally, the increase in conductance contrast (\u0000<inline-formula> <tex-math>${g_ {text {ON}} }/ {g_ {text {OFF}} }$ </tex-math></inline-formula>\u0000) enhances the maximum achievable SNDR only until it reaches a value of approximately 12. ReRAMs and FeFETs demonstrate high energy efficiencies while achieving high SNDR, as their low conductance values lead to lower currents and lower noise due to wire parasitics. In all cases, across all three device types, DP dimension, ADC precision, and conductance contrast, the maximum achievable SNDR is found to be in the range of 18–22 dB, barely meeting the minimum needed for achieving an inference accuracy close to an equivalent fixed-point digital architecture. Finally, we demonstrate a network-level accuracy of 84.5% when mapping an ResNet-20 (CIFAR-10) by ReRAM-based architecture at a SNDR of 22 dB, in which MRAM- and FeFET-based architectures cannot realize. This result clearly implies the need for other approaches, e.g., algorithmic- and learning-based methods, to improve the inference accuracy of resistive IMC architectures.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"22-30"},"PeriodicalIF":2.4,"publicationDate":"2024-03-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10478888","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140544290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Impact of Technology Scaling and Back-End-of-the-Line Technology Solutions on Magnetic Random-Access Memories","authors":"Piyush Kumar;Da Eun Shim;Siri Narla;Azad Naeemi","doi":"10.1109/JXCDC.2024.3357625","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3357625","url":null,"abstract":"While magnetic random-access memories (MRAMs) are promising because of their nonvolatility, relatively fast speeds, and high endurance, there are major challenges in adopting them for the advanced technology nodes. One of the major challenges in scaling MRAM devices is caused by the ever-increasing resistances of interconnects. In this article, we first study the impact of shrunk interconnect dimensions on MRAM performance at various technology nodes. Then, we investigate the impact of various potential back-end-of-the-line (BEOL) technology solutions at the 7 nm node. Based on interconnect resistance values from technology computer-aided design (TCAD) simulations and MRAM device characteristics from experimentally validated/calibrated physical models, we quantify the potential array-level performance of MRAM using SPICE simulations. We project that some potential BEOL technology solutions can reduce the write energy by up to 34.6% with spin–orbit torque (SOT) MRAM and 29.0% with spin-transfer torque (STT) MRAM. We also observe up to 21.4% reduction in the read energy of the SOT-MRAM arrays.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"13-21"},"PeriodicalIF":2.4,"publicationDate":"2024-01-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10412202","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139732024","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Source Design of Vertical III–V Nanowire Tunnel Field-Effect Transistors","authors":"Gautham Rangasamy;Zhongyunshen Zhu;Lars-Erik Wernersson","doi":"10.1109/JXCDC.2024.3355949","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3355949","url":null,"abstract":"We systematically fabricate devices and analyze data for vertical InAs/(In)GaAsSb nanowire tunnel field-effect transistors (TFETs), to study the influence of source dopant position and level on their device performance. The results show that delaying the introduction of dopants further in the GaAsSb source segments improved the transistor metrics (subthreshold swing (SS) and the on-current performance), due to the formation of a nid-InAsSb segment. The devices display a minimum SS of 26 mV/dec and on-current of \u0000<inline-formula> <tex-math>$10.2 ~mu text{A}/mu text{m}$ </tex-math></inline-formula>\u0000 at \u0000<inline-formula> <tex-math>$V_{text {DS}}$ </tex-math></inline-formula>\u0000 of 300 mV. The performance of devices were improved further by optimizing the doping levels which led to record subthermal current of \u0000<inline-formula> <tex-math>$1.2 ~mu text{A}/mu text{m}$ </tex-math></inline-formula>\u0000 and transconductance of \u0000<inline-formula> <tex-math>$205 ~mu text{S}/mu text{m}$ </tex-math></inline-formula>\u0000 at \u0000<inline-formula> <tex-math>$V_{text {DS}}$ </tex-math></inline-formula>\u0000 of 500 mV.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"10 ","pages":"8-12"},"PeriodicalIF":2.4,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10409158","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139727453","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits—Volume 9, No. 2","authors":"Azad Naeemi","doi":"10.1109/JXCDC.2023.3349088","DOIUrl":"https://doi.org/10.1109/JXCDC.2023.3349088","url":null,"abstract":"Welcome to the seventh volume, second semiannual issue of IEEE Journal on Exploratory Solid-State Computational Devices and Circuits (JXCDC), a multidisciplinary, open-access IEEE journal that is focused on publishing seminal research in the exploration of energy-efficient computing based on physics and materials to enable new devices, circuits, and architecture that will be of great interest to integrated circuit researchers and those working in the IT industry. The articles in the journal are selectively chosen to provide insight into the architectural, circuit, and device implications of emerging quantum nanoelectronic and nanomagnetic device technologies. The discovery of new materials, devices, and circuits for energy-efficient computational circuits will be needed to enable Moore’s law to continue for computing beyond the end of the roadmap for CMOS technologies, with significant improvement in energy efficiency and cost per function.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":"9 2","pages":"ii-iii"},"PeriodicalIF":2.4,"publicationDate":"2023-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10406188","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139494229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}