{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information","authors":"","doi":"10.1109/TVLSI.2025.3568415","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3568415","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"C2-C2"},"PeriodicalIF":2.8,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11010808","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2025.3568413","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3568413","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"C3-C3"},"PeriodicalIF":2.8,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11010822","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"S3A-NPU: A High-Performance Hardware Accelerator for Spiking Self-Supervised Learning With Dynamic Adaptive Memory Optimization","authors":"Heuijee Yun;Daejin Park","doi":"10.1109/TVLSI.2025.3566949","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3566949","url":null,"abstract":"Spiking self-supervised learning (SSL) has become prevalent for low power consumption and low-latency properties, as well as the ability to learn from large quantities of unlabeled data. However, the computational intensity and resource requirements are significant challenges to apply to accelerators. In this article, we propose the scalable, spiking self-supervised learning, streamline optimization accelerator (<inline-formula> <tex-math>$S^{3}$ </tex-math></inline-formula>A)-neural processing unit (NPU), a highly optimized accelerator for spiking SSL models. This architecture minimizes memory access by leveraging input data provided by the user and optimizes computation through the maximization of data reuse. By dynamically optimizing memory based on model characteristics and implementing specialized operations for data preprocessing, which are critical in SSL, computational efficiency can be significantly improved. The parallel processing lanes account for the two encoders in the SSL architecture, combined with a pipelined structure that considers the temporal data accumulation of spiking neural networks (SNNs) to enhance computational efficiency. We evaluate the design on field-programmable gate array (FPGA), where a 16-bit quantized spiking residual network (ResNet) model trained on the Canadian Institute for Advanced Research (CIFAR) and MNIST dataset has top 94.08% accuracy. <inline-formula> <tex-math>$S^{3}$ </tex-math></inline-formula>A-NPU optimization significantly improved computational resource utilization, resulting in a 25% reduction in latency. Moreover, as the first spiking self-supervised accelerator, it demonstrated highly efficient computation compared to existing accelerators, utilizing only 29k look up tables (LUTs) and eight block random access memories (BRAMs). This makes it highly suitable for resource-constrained applications, particularly in the context of spiking SSL models on edge devices. We implemented it on a silicon chip using a 130-nm process design kit (PDK), and the design was less than <inline-formula> <tex-math>$1~text {cm}^{2}$ </tex-math></inline-formula>.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1886-1898"},"PeriodicalIF":2.8,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11010182","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"3-D Digital Compute-in-Memory Benchmark With A5 CFET Technology: An Extension to Lookup-Table-Based Design","authors":"Junmo Lee;Minji Shon;Faaiq Waqar;Shimeng Yu","doi":"10.1109/TVLSI.2025.3566346","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3566346","url":null,"abstract":"Digital compute-in-memory (DCIM) has emerged as a promising solution to address scalability and accuracy challenges in analog compute-in-memory (ACIM) for next-generation AI hardware acceleration. In this work, we present a comprehensive device-to-system codesign process for the two proposed 3-D DCIM architectures at the projected 5 angstrom (A5) complementary FET (CFET) technology node: 1) 3-D DCIM based on 8T DCIM bit cell and 2) lookup-table (LUT)-based 3-D DCIM. A novel A5 CFET-based 8T DCIM bit cell (6T SRAM +2T AND gate) is proposed to improve total footprint and latency over the conventional 10T DCIM bit cell, and its functionality is verified through technology computer-aided design (TCAD) simulation. For macro- and system-level evaluation of the proposed 3-D DCIM architectures, an extended NeuroSim V1.4 framework is developed, the first compute-in-memory (CIM) benchmark framework enabling CIM simulation at the A5 CFET technology node. We demonstrate that the proposed 3-D DCIM with 8T DCIM bit cell at the A5 CFET technology node can achieve <inline-formula> <tex-math>$8.2times $ </tex-math></inline-formula> improvement in figure of merit (FOM) (=TOPS/W <inline-formula> <tex-math>$times $ </tex-math></inline-formula> TOPS/mm<sup>2</sup>) over the state-of-the-art 3-nm FinFET-based DCIM design. The LUT-based 3-D DCIM design is additionally proposed to achieve further power consumption reduction from the 8T DCIM bit-cell-based 3-D DCIM. LUT-based 3-D DCIM achieves a 44% reduction in energy consumption compared to the conventional 10T DCIM bit-cell-based 3-D DCIM. Our findings suggest the significant implications for technology scaling below 1 nm in high-performance DCIM design.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1910-1919"},"PeriodicalIF":2.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information","authors":"","doi":"10.1109/TVLSI.2025.3549990","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3549990","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 4","pages":"C2-C2"},"PeriodicalIF":2.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10937138","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Design Methodology for Thermal Monitoring of Reusable Passive Interposers With RTDs","authors":"Andreas Tsiougkos;Vasilis F. Pavlidis","doi":"10.1109/TVLSI.2025.3567824","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3567824","url":null,"abstract":"The heterogeneous integration underpinned by several advanced packaging options, such as passive interposers offers a promising direction for future integrated systems. However, the diversity of chiplets integrated in these systems can increase design complexity. A means to mitigate this situation is to reuse interposer fabrics. Consequently, reusable interposers should provide for signaling, power, and thermal issues. This work emphasizes thermal issues by introducing a novel and sufficiently accurate thermal monitoring strategy suitable for reusable passive interposers. The proposed strategy is based on metal resistance temperature detectors (RTDs) as sensors optimally arranged on a fixed rectangular grid supporting the reuse of passive interposers. A step-by-step methodology provides the design and allocation of the sensors across the interposer fabric under temperature precision and area constraints. Diverse benchmark scenarios are investigated with the proposed RTDs, which consume only <inline-formula> <tex-math>$33.6~mu text {W}$ </tex-math></inline-formula> with a footprint of only <inline-formula> <tex-math>$0.159~text {mm}^{2}$ </tex-math></inline-formula>. Simulation results show that the proposed methodology achieves six times (<inline-formula> <tex-math>$6times $ </tex-math></inline-formula>) improvement in mean absolute error (MAE) for reconstructed heatmaps over conventional chiplet-based sensors. This improvement is shown for different chiplet placements onto an interposer and for 2.5-D heterogeneous systems, where the integrated components do not include any or sufficient on-chip thermal sensors to provide the required temperature precision.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1803-1815"},"PeriodicalIF":2.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A TSV Misalignment-Based Repair Architecture in 3-D Chips","authors":"Huaguo Liang;Jiahui Xiao;Xianrui Dou;Tianming Ni;Yingchun Lu;Zhengfeng Huang","doi":"10.1109/TVLSI.2025.3565650","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3565650","url":null,"abstract":"As a critical component of 3-D integrated circuits (3D-ICs), the quality of through-silicon vias (TSVs) significantly impacts the yield and reliability of 3D-ICs, especially the clustered faults during manufacturing. In this article, a repair architecture based on TSV misalignment is proposed. This architecture achieves a higher repair rate by physically connecting the signal not to its closest TSV but only to the TSVs far away from each other. Experimental results show that the average repair rate of the proposed architecture increases by 13.42% compared to the existing repair architectures of the same type for clustered faults. Compared to the router-based architecture, the proposed architecture has a similar average repair rate with less than 0.15% difference in fewer than eight clustered faults, reducing the delay and MUX area overhead by 70.27% and 54.17%, respectively.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1816-1825"},"PeriodicalIF":2.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2025.3549993","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3549993","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 4","pages":"C3-C3"},"PeriodicalIF":2.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10937163","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143667706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A 28-nm 9T1C SRAM-Based CIM Macro With Hierarchical Capacitance Weighting and Two-Step Capacitive Comparison ADCs for CNNs","authors":"Zhiting Lin;Runru Yu;Yunhao Li;Miao Long;Yu Liu;Jianxing Zhou;Da Huo;Qingchuan Zhu;Yue Zhao;Lintao Chen;Chunyu Peng;Qiang Zhao;Xin Li;Chenghu Dai;Xiulong Wu","doi":"10.1109/TVLSI.2025.3545635","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3545635","url":null,"abstract":"In the realm of charge-domain computing-in-memory (CIM) macros, reducing the area of capacitor ladder and analog-to-digital converter (ADC) while maintaining high throughput remains a significant challenge. This brief introduces an adjustable-weight CIM macro designed to enhance both energy efficiency and area efficiency for convolutional neural networks (CNNs). The proposed architecture uses: 1) a customized 9T1C bit cell for sensing margin improvement and bidirectional decoupled read ports; 2) a hierarchical capacitance weighting (HCW) structure that achieves a weight accumulation of 1/2/4 bits with less capacitance area and weighting time; and 3) a two-step capacitive comparison ADCs (TC-ADCs) readout scheme to improve area efficiency and throughput. The proposed 8-kb static random address memory (SRAM) CIM macro is implemented using 28-nm CMOS technology. It can achieve an energy efficiency of 224.4 TOPS/W and an area efficiency of 21.894 TOPS/mm<sup>2</sup>, and the accuracies on MNIST, CIFAR-10, and CIFAR-100 datasets are 99.67%, 89.13%, and 67.58% with a 4-b input and 4-b weight.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"2009-2013"},"PeriodicalIF":2.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519442","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruikang Liu;Min Song;Changzhen Yu;Zhen Zhang;Wei Duan;Dawei Li;Ming Zhang;Meilin Wan
{"title":"A Featureless Dual-Mode Latch-Based PUF","authors":"Ruikang Liu;Min Song;Changzhen Yu;Zhen Zhang;Wei Duan;Dawei Li;Ming Zhang;Meilin Wan","doi":"10.1109/TVLSI.2025.3565644","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3565644","url":null,"abstract":"Most physical unclonable functions (PUFs) can be located by attackers and are vulnerable to various physical attacks due to their distinct image and circuit features. To address this vulnerability, this article proposes a featureless dual-mode latch-based (FDL) PUF that is concealed within the digital circuit. The FDL PUF is implemented using standard cells and a digital design flow. It is then randomly distributed among other standard digital cells within the chip to eliminate possible identification of image features. Moreover, the output key of the FDL PUF is randomly extracted, and the FDL PUF is then repurposed to store other intermediate variables of the security algorithm, effectively eliminating the circuit features. The proposed FDL PUF is integrated into a secure identity authentication chip fabricated using a standard 0.18-<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>m CMOS process. The feasibility of locating the FDL PUF units is evaluated using computer vision technologies, specifically YOLOv10 combined with OpenCV. Test results demonstrate that the number of suspected latch-based PUF units is approximately 15 times higher than the actual number of FDL PUF units for the test security chip, highlighting the significant challenge faced by attackers when attempting to locate the FDL PUF.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2312-2323"},"PeriodicalIF":2.8,"publicationDate":"2025-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144705276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}