{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information","authors":"","doi":"10.1109/TVLSI.2025.3568415","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3568415","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"C2-C2"},"PeriodicalIF":2.8,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11010808","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2025.3568413","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3568413","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"C3-C3"},"PeriodicalIF":2.8,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11010822","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"S3A-NPU: A High-Performance Hardware Accelerator for Spiking Self-Supervised Learning With Dynamic Adaptive Memory Optimization","authors":"Heuijee Yun;Daejin Park","doi":"10.1109/TVLSI.2025.3566949","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3566949","url":null,"abstract":"Spiking self-supervised learning (SSL) has become prevalent for low power consumption and low-latency properties, as well as the ability to learn from large quantities of unlabeled data. However, the computational intensity and resource requirements are significant challenges to apply to accelerators. In this article, we propose the scalable, spiking self-supervised learning, streamline optimization accelerator (<inline-formula> <tex-math>$S^{3}$ </tex-math></inline-formula>A)-neural processing unit (NPU), a highly optimized accelerator for spiking SSL models. This architecture minimizes memory access by leveraging input data provided by the user and optimizes computation through the maximization of data reuse. By dynamically optimizing memory based on model characteristics and implementing specialized operations for data preprocessing, which are critical in SSL, computational efficiency can be significantly improved. The parallel processing lanes account for the two encoders in the SSL architecture, combined with a pipelined structure that considers the temporal data accumulation of spiking neural networks (SNNs) to enhance computational efficiency. We evaluate the design on field-programmable gate array (FPGA), where a 16-bit quantized spiking residual network (ResNet) model trained on the Canadian Institute for Advanced Research (CIFAR) and MNIST dataset has top 94.08% accuracy. <inline-formula> <tex-math>$S^{3}$ </tex-math></inline-formula>A-NPU optimization significantly improved computational resource utilization, resulting in a 25% reduction in latency. Moreover, as the first spiking self-supervised accelerator, it demonstrated highly efficient computation compared to existing accelerators, utilizing only 29k look up tables (LUTs) and eight block random access memories (BRAMs). This makes it highly suitable for resource-constrained applications, particularly in the context of spiking SSL models on edge devices. We implemented it on a silicon chip using a 130-nm process design kit (PDK), and the design was less than <inline-formula> <tex-math>$1~text {cm}^{2}$ </tex-math></inline-formula>.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1886-1898"},"PeriodicalIF":2.8,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11010182","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3-D Digital Compute-in-Memory Benchmark With A5 CFET Technology: An Extension to Lookup-Table-Based Design","authors":"Junmo Lee;Minji Shon;Faaiq Waqar;Shimeng Yu","doi":"10.1109/TVLSI.2025.3566346","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3566346","url":null,"abstract":"Digital compute-in-memory (DCIM) has emerged as a promising solution to address scalability and accuracy challenges in analog compute-in-memory (ACIM) for next-generation AI hardware acceleration. In this work, we present a comprehensive device-to-system codesign process for the two proposed 3-D DCIM architectures at the projected 5 angstrom (A5) complementary FET (CFET) technology node: 1) 3-D DCIM based on 8T DCIM bit cell and 2) lookup-table (LUT)-based 3-D DCIM. A novel A5 CFET-based 8T DCIM bit cell (6T SRAM +2T AND gate) is proposed to improve total footprint and latency over the conventional 10T DCIM bit cell, and its functionality is verified through technology computer-aided design (TCAD) simulation. For macro- and system-level evaluation of the proposed 3-D DCIM architectures, an extended NeuroSim V1.4 framework is developed, the first compute-in-memory (CIM) benchmark framework enabling CIM simulation at the A5 CFET technology node. We demonstrate that the proposed 3-D DCIM with 8T DCIM bit cell at the A5 CFET technology node can achieve <inline-formula> <tex-math>$8.2times $ </tex-math></inline-formula> improvement in figure of merit (FOM) (=TOPS/W <inline-formula> <tex-math>$times $ </tex-math></inline-formula> TOPS/mm<sup>2</sup>) over the state-of-the-art 3-nm FinFET-based DCIM design. The LUT-based 3-D DCIM design is additionally proposed to achieve further power consumption reduction from the 8T DCIM bit-cell-based 3-D DCIM. LUT-based 3-D DCIM achieves a 44% reduction in energy consumption compared to the conventional 10T DCIM bit-cell-based 3-D DCIM. Our findings suggest the significant implications for technology scaling below 1 nm in high-performance DCIM design.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1910-1919"},"PeriodicalIF":2.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information","authors":"","doi":"10.1109/TVLSI.2025.3549990","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3549990","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 4","pages":"C2-C2"},"PeriodicalIF":2.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10937138","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Design Methodology for Thermal Monitoring of Reusable Passive Interposers With RTDs","authors":"Andreas Tsiougkos;Vasilis F. Pavlidis","doi":"10.1109/TVLSI.2025.3567824","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3567824","url":null,"abstract":"The heterogeneous integration underpinned by several advanced packaging options, such as passive interposers offers a promising direction for future integrated systems. However, the diversity of chiplets integrated in these systems can increase design complexity. A means to mitigate this situation is to reuse interposer fabrics. Consequently, reusable interposers should provide for signaling, power, and thermal issues. This work emphasizes thermal issues by introducing a novel and sufficiently accurate thermal monitoring strategy suitable for reusable passive interposers. The proposed strategy is based on metal resistance temperature detectors (RTDs) as sensors optimally arranged on a fixed rectangular grid supporting the reuse of passive interposers. A step-by-step methodology provides the design and allocation of the sensors across the interposer fabric under temperature precision and area constraints. Diverse benchmark scenarios are investigated with the proposed RTDs, which consume only <inline-formula> <tex-math>$33.6~mu text {W}$ </tex-math></inline-formula> with a footprint of only <inline-formula> <tex-math>$0.159~text {mm}^{2}$ </tex-math></inline-formula>. Simulation results show that the proposed methodology achieves six times (<inline-formula> <tex-math>$6times $ </tex-math></inline-formula>) improvement in mean absolute error (MAE) for reconstructed heatmaps over conventional chiplet-based sensors. This improvement is shown for different chiplet placements onto an interposer and for 2.5-D heterogeneous systems, where the integrated components do not include any or sufficient on-chip thermal sensors to provide the required temperature precision.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1803-1815"},"PeriodicalIF":2.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A TSV Misalignment-Based Repair Architecture in 3-D Chips","authors":"Huaguo Liang;Jiahui Xiao;Xianrui Dou;Tianming Ni;Yingchun Lu;Zhengfeng Huang","doi":"10.1109/TVLSI.2025.3565650","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3565650","url":null,"abstract":"As a critical component of 3-D integrated circuits (3D-ICs), the quality of through-silicon vias (TSVs) significantly impacts the yield and reliability of 3D-ICs, especially the clustered faults during manufacturing. In this article, a repair architecture based on TSV misalignment is proposed. This architecture achieves a higher repair rate by physically connecting the signal not to its closest TSV but only to the TSVs far away from each other. Experimental results show that the average repair rate of the proposed architecture increases by 13.42% compared to the existing repair architectures of the same type for clustered faults. Compared to the router-based architecture, the proposed architecture has a similar average repair rate with less than 0.15% difference in fewer than eight clustered faults, reducing the delay and MUX area overhead by 70.27% and 54.17%, respectively.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1816-1825"},"PeriodicalIF":2.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2025.3549993","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3549993","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 4","pages":"C3-C3"},"PeriodicalIF":2.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10937163","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143667706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 28-nm 9T1C SRAM-Based CIM Macro With Hierarchical Capacitance Weighting and Two-Step Capacitive Comparison ADCs for CNNs","authors":"Zhiting Lin;Runru Yu;Yunhao Li;Miao Long;Yu Liu;Jianxing Zhou;Da Huo;Qingchuan Zhu;Yue Zhao;Lintao Chen;Chunyu Peng;Qiang Zhao;Xin Li;Chenghu Dai;Xiulong Wu","doi":"10.1109/TVLSI.2025.3545635","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3545635","url":null,"abstract":"In the realm of charge-domain computing-in-memory (CIM) macros, reducing the area of capacitor ladder and analog-to-digital converter (ADC) while maintaining high throughput remains a significant challenge. This brief introduces an adjustable-weight CIM macro designed to enhance both energy efficiency and area efficiency for convolutional neural networks (CNNs). The proposed architecture uses: 1) a customized 9T1C bit cell for sensing margin improvement and bidirectional decoupled read ports; 2) a hierarchical capacitance weighting (HCW) structure that achieves a weight accumulation of 1/2/4 bits with less capacitance area and weighting time; and 3) a two-step capacitive comparison ADCs (TC-ADCs) readout scheme to improve area efficiency and throughput. The proposed 8-kb static random address memory (SRAM) CIM macro is implemented using 28-nm CMOS technology. It can achieve an energy efficiency of 224.4 TOPS/W and an area efficiency of 21.894 TOPS/mm<sup>2</sup>, and the accuracies on MNIST, CIFAR-10, and CIFAR-100 datasets are 99.67%, 89.13%, and 67.58% with a 4-b input and 4-b weight.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"2009-2013"},"PeriodicalIF":2.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ahmad T. Sheikh;Ali Shoker;Suhaib A. Fahmy;Paulo Esteves-Verissimo
{"title":"ResiLogic: Leveraging Composability and Diversity to Design Fault and Intrusion Resilient Chips","authors":"Ahmad T. Sheikh;Ali Shoker;Suhaib A. Fahmy;Paulo Esteves-Verissimo","doi":"10.1109/TVLSI.2025.3544860","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3544860","url":null,"abstract":"A long-standing challenge is the design of chips resilient to faults and glitches. Both fine-grained gate diversity and coarse-grained modular redundancy have been used in the past. However, these approaches have not been well-studied under other threat models where some stakeholders in the supply chain are untrusted. Increasing digital sovereignty tensions raise concerns regarding the use of foreign off-the-shelf tools and intellectual property (IP), or off-sourcing fabrication, driving research into the design of resilient chips under this threat model. This article addresses a threat model considering three pertinent attacks to resilience: distribution, zonal, and compound attacks. To mitigate these attacks, we introduce the <italic>ResiLogic</i> framework that exploits <italic>Diversity by Composability</i>: constructing diverse circuits composed of smaller diverse ones by design. This approach enables designers to develop circuits in the early stages of design without the need for additional redundancy in terms of space or cost. To generate diverse circuits, we propose a technique using E-Graphs with new rewrite definitions for diversity. Using this approach at different levels of granularity is shown to improve the resilience of circuit design in <italic>ResiLogic</i> up to <inline-formula> <tex-math>$times 5$ </tex-math></inline-formula> against the three considered attacks.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"1751-1764"},"PeriodicalIF":2.8,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117383","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}