Mohammad Adnaan;Sou-Chi Chang;Hai Li;Yu-Ching Liao;Ian A. Young;Azad Naeemi
{"title":"Design Considerations for Sub-1-V 1T1C FeRAM Memory Circuits","authors":"Mohammad Adnaan;Sou-Chi Chang;Hai Li;Yu-Ching Liao;Ian A. Young;Azad Naeemi","doi":"10.1109/JXCDC.2024.3488578","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3488578","url":null,"abstract":"We present a comprehensive benchmarking framework for one transistor-one capacitor (1T1C) low-voltage ferroelectric random access memory (FeRAM) circuits. We focus on the most promising ferroelectric materials, hafnium zirconium oxide (HZO) and barium titanate (BTO), known for their fast switching speeds and low coercive voltages. We model ferroelectric capacitors using physics-based phase-field models and calibrate the polarization switching speed and hysteresis loop versus experimental data. Ferroelectric memory cells are designed using a 28-nm process design kit (PDK), incorporating peripheral circuitry and interconnect parasitics. We set up the memory array circuit design and analyze its performance by varying the row/column size of the memory array, as well as driver and capacitor sizes. Our results are compared with other emerging memory technologies, particularly magnetic/spintronic memories, in terms of read/write latencies and energy consumption. We identify the critical aspects of the ferroelectric memory array performance, such as the effect of plateline driver and bitline capacitances, and provide recommendations to further optimize the performance of such low operating voltage ferroelectric memory circuits.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-10-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10738514","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600122","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Madison Manley;Ashita Victor;Hyunggyu Park;Ankit Kaul;Mohanalingam Kathaperumal;Muhannad S. Bakir
{"title":"Heterogeneous Integration Technologies for Artificial Intelligence Applications","authors":"Madison Manley;Ashita Victor;Hyunggyu Park;Ankit Kaul;Mohanalingam Kathaperumal;Muhannad S. Bakir","doi":"10.1109/JXCDC.2024.3484958","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3484958","url":null,"abstract":"The rapid advancement of artificial intelligence (AI) has been enabled by semiconductor-based electronics. However, the conventional methods of transistor scaling are not enough to meet the exponential demand for computing power driven by AI. This has led to a technological shift toward system-level scaling approaches, such as heterogeneous integration (HI). HI is becoming increasingly implemented in many AI accelerator products due to its potential to enhance overall system performance while also reducing electrical interconnect delays and energy consumption, which are critical for supporting data-intensive AI workloads. In this review, we introduce current and emerging HI technologies and their potential for high-performance systems. We then survey recent industrial and research progress in 3-D HI technologies that enable high bandwidth systems and finally present the emergence of glass core packaging for high-performance AI chip packages.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-10-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10731842","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Scaling Logic Area With Multitier Standard Cells","authors":"Florian Freye;Christian Lanius;Hossein Hashemi Shadmehri;Diana Göhringer;Tobias Gemmeke","doi":"10.1109/JXCDC.2024.3482464","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3482464","url":null,"abstract":"While the footprint of digital complementary metal-oxide–semiconductor (CMOS) circuits has continued to decrease over the years, physical limitations for further intralayer geometric scaling become apparent. To further increase the logic density, the international roadmap for devices and systems (IRDS) predicts a transition from a single layer of transistors per die to monolithically stacking transistors in multiple tiers starting in 2031. This raises the question of the extent to which these can be exploited in 3-D standard cells to improve logic density. In this work, we investigate the scaling potential of realizing standard cells employing two or three dedicated tiers. For this, specific multitier virtual physical design kits are derived based on the open ASAP7. A typical RISC-V implementation realized in a classic standard cell library is used to identify the subset of the most relevant standard cells. In accordance with the virtual physical design kit (PDK), 3-D derivatives of the single-tier standard cells are crafted and evaluated with respect to achievable logic density considering standard synthesis benchmarks and blocks on the architecture level.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10720813","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142595061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Energy-/Carbon-Aware Evaluation and Optimization of 3-D IC Architecture With Digital Compute-in-Memory Designs","authors":"Hyung Joon Byun;Udit Gupta;Jae-Sun Seo","doi":"10.1109/JXCDC.2024.3479100","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3479100","url":null,"abstract":"Several 2-D architectures have been presented, including systolic arrays or compute-in-memory (CIM) arrays for energy-efficient artificial intelligence (AI) inference. To increase the energy efficiency within constrained area, 3-D technologies have been actively investigated, which have the potential to decrease the data path length or increase the activation buffer size, enabling higher energy efficiency. Several works have reported the 3-D architectures using non-CIM designs, but investigations on 3-D architectures with CIM macros have not been well studied in prior works. In this article, we investigate digital CIM (DCIM) macros and various 3-D architectures to find the opportunity of increased energy efficiency compared with 2-D structures. Moreover, we also investigated the carbon footprint of 3-D architectures. We have built in-house simulators calculating energy and area given high-level hardware descriptions and DNN workloads and integrated with carbon estimation tool to analyze the embodied carbon of various hardware designs. We have investigated different types of 3-D DCIM architectures and dataflows, which have shown 42.5% energy savings compared with 2-D systolic arrays on average. Also, we have analyzed the tradeoff between performance and carbon footprint and their optimization opportunities.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10714410","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142600413","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Accuracy Improvement With Weight Mapping Strategy and Output Transformation for STT-MRAM-Based Computing-in-Memory","authors":"Xianggao Wang;Na Wei;Shifan Gao;Wenhao Wu;Yi Zhao","doi":"10.1109/JXCDC.2024.3478360","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3478360","url":null,"abstract":"This work presents an analog computing-in-memory (CiM) macro with spin-transfer torque magnetic random access memory (STT-MRAM) and 28-nm CMOS technology. The adopted CiM bitcell uses a differential scheme and balances the input resistance to minimize the nonideal factors during multiply-accumulate (MAC) operations. Specialized peripheral circuits were designed for the current-scheme CiM architecture. More importantly, strategies of accuracy improvement were innovatively proposed as follows: 1) mapping most significant bit (MSB) to the far side of the MRAM array and 2) output linear transformation based on the reference columns. Circuit-level simulation verified the functionality and performance improvement of the CiM macro based on the MNIST and CIFAR-10 datasets, realizing a 3% and 5% accuracy loss compared with the benchmark, respectively. The 640-GOPS (8 bit) throughput, 34.6-TOPS/mm2 area compactness, and 83.3-TOPS/W energy efficiency demonstrate the advantages of STT-MRAM CiM in the coming AI era.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-10-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10714384","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142587601","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward Fine-Grained Partitioning of Low-Level SRAM Caches for Emerging 3D-IC Designs","authors":"Sudipta Das;Bhawana Kumari;Siva Satyendra Sahoo;Yukai Chen;James Myers;Dragomir Milojevic;Dwaipayan Biswas;Julien Ryckaert","doi":"10.1109/JXCDC.2024.3468386","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3468386","url":null,"abstract":"Scaling on-chip memory capacity is one of the primary approaches to mitigate memory wall bottlenecks. Various 2.5-D/3-D integration schemes, leveraging novel partitioning, are being actively explored to improve system performance. However, fine-grained functional partitioning of memory macros is not widely reported. As static RAM (SRAM) scaling stagnates with emerging CMOS logic roadmap, we propose a partitioning of low-level (faster access) caches in 3-D using an array under CMOS (AuC) technology paradigm. Our study focuses on partitioning and optimization of SRAM bit-cells and peripheral circuits, enabling heterogeneous integration, achieving up to 12% higher operating frequency with 50% leakage power reduction in the memory macros. Applied on a 64-bit mobile system-on-chip (SoC) CPU core, we achieve up to 60% higher energy efficiency compared with 2-D baseline and 14% increase in operating frequency compared with standard memory-on-logic 3-D partitioning scheme.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10695147","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142440855","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Harrison Liew;Farhana Sheikh;Jong-Ru Guo;Zuoguo Wu;Borivoje Nikolić
{"title":"A Chisel Generator for Standardized 3-D Die-to-Die Interconnects","authors":"Harrison Liew;Farhana Sheikh;Jong-Ru Guo;Zuoguo Wu;Borivoje Nikolić","doi":"10.1109/JXCDC.2024.3461471","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3461471","url":null,"abstract":"A 3-D heterogeneous integration (3-D-HI) is poised to enable a new era of high-performance integrated circuits via a multitude of benefits, including a reduction in I/O power consumption and ability to tightly couple disparate technologies. However, a significant hurdle toward enabling a chiplet ecosystem is the standardization of 3-D die-to-die (D2D) interconnects that facilitate rapid integration. Technology-driven constraints highlighted in published works demonstrate that a unique approach to 3-D D2D interconnect design and implementation is required, while preserving the ability to customize the interconnect to accommodate future technology concerns and applications with minimal overhead. This article presents a framework to generate customized 3-D D2D interconnect physical layers (PHYs) that are simultaneously standard-compliant, physical-aware, and can be automatically integrated into all stacked chiplets. The generator framework leverages the Chisel hardware description language to allow designers to do the following: 1) compile a port list directly into a PHY; 2) automate design and physical design (PD); and 3) perform design space exploration of interconnect features (e.g., bump map pitch, clocking architecture, and others). The 3-D PHY generator framework and features detailed in this work can be used to produce a reference implementation for a standard like UCIe-3-D, representing a significant paradigm shift from current specification and design methodologies for 2.5-D D2D interconnect (e.g., UCIe) implementations. This work concludes with the results of a redundancy design space exploration tradeoff study, showing the benefits of a proposed spatial coding redundancy scheme in an example PHY using emulated 9-\u0000<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>\u0000m hybrid bonding for a 4 Tx/4 Rx module array with 4:1 coding redundancy ratio.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10681023","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142408973","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CMOS Single-Photon Avalanche Diode Circuits for Probabilistic Computing","authors":"William Whitehead;Wonsik Oh;Luke Theogarajan","doi":"10.1109/JXCDC.2024.3452030","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3452030","url":null,"abstract":"Intrinsically random hardware devices are increasingly attracting attention for their potential use in probabilistic computing architectures. One such device is the single-photon avalanche diode (SPAD) and an associated functional unit, the variable-rate SPAD circuit (VRSC), recently proposed by us as a source of randomness for sampling and annealing Ising and Potts models. This work develops a more advanced understanding of these VRSCs by introducing several VRSC design options and studying their tradeoffs as implemented in a 65-nm CMOS process. Each VRSC is composed of a SPAD and a processing circuit. Combinations of three different SPAD designs and three different types of processing circuits were evaluated on several metrics such as area, speed, and variability. Measured results from the SPAD design space show that even extremely small SPADs are suitable for probabilistic computing purposes, and that high dark count rates are not detrimental either, so SPADs for probabilistic computing are actually easier to integrate in standard CMOS processes. Results from the circuit design space show that the time-to-analog-based designs introduced in this work can produce highly exponential and analytical transfer functions, but that the less analytically tractable output of the previously proposed filter-based designs can achieve less variability in a smaller footprint. Probabilistic bits (P-bits) composed of the fabricated VRSCs achieve bit flip rates of 50 MHz and allow at least one order of magnitude of control over their simulated annealing temperature.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10659028","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142246389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Monolithic 3-D-Based Nonvolatile Associative Processor for High-Performance Energy-Efficient Computations","authors":"Esteban Garzón;Alessandro Bedoya;Marco Lanuzza;Leonid Yavits","doi":"10.1109/JXCDC.2024.3450810","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3450810","url":null,"abstract":"This article presents a monolithic 3-D associative in-memory processor (M3D AP) that combines emerging nonvolatile (NV) magnetic tunnel junction (MTJ) technology with massively parallel associative in-memory processing and M3D integration. The proposed architecture features two monolithic layers, with CMOS logic in the first layer and an MTJ-based content-addressable memory (CAM) array in the second layer. We conduct a thorough analysis of the electrical characteristics of the MTJ-based AP and use analysis results to evaluate the performance and power consumption of the M3D AP. We build a custom cycle-accurate simulator to implement and evaluate the 3-D associative matrix multiplication algorithm, highlighting the potential of the M3D AP for accelerating artificial intelligence applications. Overall, we demonstrate the efficacy of M3D AP and show that it holds promise for high-performance and energy-efficient in-memory computing.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10649641","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142174008","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MEFET-Based CAM/TCAM for Memory-Augmented Neural Networks","authors":"Sai Sanjeet;Jonathan Bird;Bibhu Datta Sahoo","doi":"10.1109/JXCDC.2024.3410681","DOIUrl":"https://doi.org/10.1109/JXCDC.2024.3410681","url":null,"abstract":"Memory-augmented neural networks (MANNs) require large external memories to enable long-term memory storage and retrieval. Content-addressable memory (CAM) is a type of memory used for high-speed searching applications and is well-suited for MANNs. Recent advances in exploratory nonvolatile devices have spurred the development of nonvolatile CAMs. However, these devices suffer from poor ON-OFF ratio, large write voltages, and long write times. This work proposes a nonvolatile ternary CAM (TCAM) using magnetoelectric field effect transistors (MEFETs). The energy and delay of various operations are simulated using the ASAP 7-nm predictive technology for the transistors and a Verilog-A model of the MEFET. The proposed structure achieves orders of magnitude improvement in search energy and \u0000<inline-formula> <tex-math>$gt 45times $ </tex-math></inline-formula>\u0000 improvement in search energy-delay product compared with prior works. The write energy and delay are also improved by \u0000<inline-formula> <tex-math>$8times $ </tex-math></inline-formula>\u0000 and \u0000<inline-formula> <tex-math>$12times $ </tex-math></inline-formula>\u0000, respectively, compared with CAMs designed with other nonvolatile devices. A variability analysis is performed to study the effect of process variations on the CAM. The proposed CAM is then used to build a one-shot learning MANN and is benchmarked with the Modified National Institute of Standards and Technology (MNIST), extended MNIST (EMNIST), and labeled faces in the wild (LFW) datasets with binary embeddings, giving >99% accuracy on MNIST, a top-3 accuracy of 97.11% on the EMNIST dataset, and >97% accuracy on the LFW dataset, with embedding sizes of 16, 64, and 512, respectively. The proposed CAM is shown to be fast, energy-efficient, and scalable, making it suitable for MANNs.","PeriodicalId":54149,"journal":{"name":"IEEE Journal on Exploratory Solid-State Computational Devices and Circuits","volume":null,"pages":null},"PeriodicalIF":2.0,"publicationDate":"2024-06-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10550938","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141439499","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}