IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献

筛选
英文 中文
An Area and Energy-Efficient Systolic Array Accelerator Architecture for Deep Neural Networks Using Stochastic Computing 基于随机计算的深度神经网络收缩阵列加速器结构
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-03-24 DOI: 10.1109/TVLSI.2025.3550786
Jingwei Zhu;Jingguo Wu;Zongru Yang;Yu Jiang;Yun Chen
{"title":"An Area and Energy-Efficient Systolic Array Accelerator Architecture for Deep Neural Networks Using Stochastic Computing","authors":"Jingwei Zhu;Jingguo Wu;Zongru Yang;Yu Jiang;Yun Chen","doi":"10.1109/TVLSI.2025.3550786","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3550786","url":null,"abstract":"Deep neural networks (DNNs) are widely used to handle various intelligent tasks. With the increased model size, the DNNs’ hardware accelerators are challenging the higher area overhead and energy consumption. Stochastic computing (SC) has recently been considered for implementing DNNs and reducing hardware consumption. However, many current SC-based DNN accelerators fail to balance accuracy, performance, and resource overhead. In addition, their limited scalability and flexibility restrict their use in edge devices. In this article, we design an area and energy-efficient DNN accelerator architecture using SC. We propose an SC-binary hybrid processing unit with piecewise shift compensation without significant additional hardware overhead increment to improve the SC accuracy. To balance performance and resource overhead, we conduct a design space exploration (DSE) from an overall architectural perspective. An experimental platform with both software and hardware for SC-based DNNs is established. The software simulation results demonstrate that the best accuracy of the designed SC-DNN on the CIFAR-10 is 91.9%, which is 3.2% higher than that of the previous SC-DNN work. The VLSI implementation of the hardware is synthesized using the TSMC 28-nm CMOS process. Results show that compared to the binary computing counterpart, our design achieves <inline-formula> <tex-math>$2.7times $ </tex-math></inline-formula> area efficiency and <inline-formula> <tex-math>$3.4times $ </tex-math></inline-formula> energy efficiency. Compared to other SC-DNN accelerator designs, our design can provide <inline-formula> <tex-math>$5.3times $ </tex-math></inline-formula> area efficiency and <inline-formula> <tex-math>$7.3times $ </tex-math></inline-formula> energy efficiency.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"1582-1595"},"PeriodicalIF":2.8,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Sub-0.9-ps Static Phase Offset 500 MHz Delay-Locked Loop With a Large Gain Phase Detector 带大增益鉴相器的低于0.9 ps静态相位偏移500mhz延时锁相环
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-03-23 DOI: 10.1109/TVLSI.2025.3566739
Jingjing Liu;Ruihuang Wu;Haoning Sun;Bingjun Xiong;Feng Yan;Kangkang Sun;Zhipeng Li;Jian Guan
{"title":"A Sub-0.9-ps Static Phase Offset 500 MHz Delay-Locked Loop With a Large Gain Phase Detector","authors":"Jingjing Liu;Ruihuang Wu;Haoning Sun;Bingjun Xiong;Feng Yan;Kangkang Sun;Zhipeng Li;Jian Guan","doi":"10.1109/TVLSI.2025.3566739","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3566739","url":null,"abstract":"This article presents an analog delay-locked loop (DLL) designed for high-precision measurement applications, featuring low static phase offset (SPO) and fast locking speed, such as time-to-digital converters (TDCs) and analog-to-digital converters (ADCs). A large gain and dead-zone free phase detector (PD) is proposed. When the DLL reaches the locked state, the phase error between the two input signals of the PD can be reduced to 0.53 ps (0.095°), which has an 18-time improvement compared to the conventional DLL. Therefore, the SPO of the entire DLL can be effectively reduced to be less than 0.87 ps. Furthermore, the auxiliary circuit, consisting of a large phase difference detector (LPDD) and fast-adjusting branches (FABs), accelerates the DLL’s locking process to 42 clock cycles and improves the locking speed by 4.1 times. Designed by a standard 180 nm CMOS technology, the DLL occupies an area of <inline-formula> <tex-math>$106.1times 93.3~mu $ </tex-math></inline-formula>m. It achieves low power consumption of 1.89 mW at 500 MHz, and the root mean square (rms) jitter and P-P jitter are 1.01 and 6.26 ps, respectively.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2143-2152"},"PeriodicalIF":2.8,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144705290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information IEEE超大规模集成电路(VLSI)系统学报
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-03-22 DOI: 10.1109/TVLSI.2025.3568415
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information","authors":"","doi":"10.1109/TVLSI.2025.3568415","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3568415","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"C2-C2"},"PeriodicalIF":2.8,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11010808","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information 超大规模集成电路(VLSI)系统学报
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-03-22 DOI: 10.1109/TVLSI.2025.3568413
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2025.3568413","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3568413","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"C3-C3"},"PeriodicalIF":2.8,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11010822","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117271","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
S3A-NPU: A High-Performance Hardware Accelerator for Spiking Self-Supervised Learning With Dynamic Adaptive Memory Optimization S3A-NPU:一种用于自监督学习和动态自适应记忆优化的高性能硬件加速器
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-03-22 DOI: 10.1109/TVLSI.2025.3566949
Heuijee Yun;Daejin Park
{"title":"S3A-NPU: A High-Performance Hardware Accelerator for Spiking Self-Supervised Learning With Dynamic Adaptive Memory Optimization","authors":"Heuijee Yun;Daejin Park","doi":"10.1109/TVLSI.2025.3566949","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3566949","url":null,"abstract":"Spiking self-supervised learning (SSL) has become prevalent for low power consumption and low-latency properties, as well as the ability to learn from large quantities of unlabeled data. However, the computational intensity and resource requirements are significant challenges to apply to accelerators. In this article, we propose the scalable, spiking self-supervised learning, streamline optimization accelerator (<inline-formula> <tex-math>$S^{3}$ </tex-math></inline-formula>A)-neural processing unit (NPU), a highly optimized accelerator for spiking SSL models. This architecture minimizes memory access by leveraging input data provided by the user and optimizes computation through the maximization of data reuse. By dynamically optimizing memory based on model characteristics and implementing specialized operations for data preprocessing, which are critical in SSL, computational efficiency can be significantly improved. The parallel processing lanes account for the two encoders in the SSL architecture, combined with a pipelined structure that considers the temporal data accumulation of spiking neural networks (SNNs) to enhance computational efficiency. We evaluate the design on field-programmable gate array (FPGA), where a 16-bit quantized spiking residual network (ResNet) model trained on the Canadian Institute for Advanced Research (CIFAR) and MNIST dataset has top 94.08% accuracy. <inline-formula> <tex-math>$S^{3}$ </tex-math></inline-formula>A-NPU optimization significantly improved computational resource utilization, resulting in a 25% reduction in latency. Moreover, as the first spiking self-supervised accelerator, it demonstrated highly efficient computation compared to existing accelerators, utilizing only 29k look up tables (LUTs) and eight block random access memories (BRAMs). This makes it highly suitable for resource-constrained applications, particularly in the context of spiking SSL models on edge devices. We implemented it on a silicon chip using a 130-nm process design kit (PDK), and the design was less than <inline-formula> <tex-math>$1~text {cm}^{2}$ </tex-math></inline-formula>.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1886-1898"},"PeriodicalIF":2.8,"publicationDate":"2025-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=11010182","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
3-D Digital Compute-in-Memory Benchmark With A5 CFET Technology: An Extension to Lookup-Table-Based Design 基于A5 CFET技术的三维数字内存计算基准测试:基于查找表设计的扩展
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-03-21 DOI: 10.1109/TVLSI.2025.3566346
Junmo Lee;Minji Shon;Faaiq Waqar;Shimeng Yu
{"title":"3-D Digital Compute-in-Memory Benchmark With A5 CFET Technology: An Extension to Lookup-Table-Based Design","authors":"Junmo Lee;Minji Shon;Faaiq Waqar;Shimeng Yu","doi":"10.1109/TVLSI.2025.3566346","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3566346","url":null,"abstract":"Digital compute-in-memory (DCIM) has emerged as a promising solution to address scalability and accuracy challenges in analog compute-in-memory (ACIM) for next-generation AI hardware acceleration. In this work, we present a comprehensive device-to-system codesign process for the two proposed 3-D DCIM architectures at the projected 5 angstrom (A5) complementary FET (CFET) technology node: 1) 3-D DCIM based on 8T DCIM bit cell and 2) lookup-table (LUT)-based 3-D DCIM. A novel A5 CFET-based 8T DCIM bit cell (6T SRAM +2T AND gate) is proposed to improve total footprint and latency over the conventional 10T DCIM bit cell, and its functionality is verified through technology computer-aided design (TCAD) simulation. For macro- and system-level evaluation of the proposed 3-D DCIM architectures, an extended NeuroSim V1.4 framework is developed, the first compute-in-memory (CIM) benchmark framework enabling CIM simulation at the A5 CFET technology node. We demonstrate that the proposed 3-D DCIM with 8T DCIM bit cell at the A5 CFET technology node can achieve <inline-formula> <tex-math>$8.2times $ </tex-math></inline-formula> improvement in figure of merit (FOM) (=TOPS/W <inline-formula> <tex-math>$times $ </tex-math></inline-formula> TOPS/mm<sup>2</sup>) over the state-of-the-art 3-nm FinFET-based DCIM design. The LUT-based 3-D DCIM design is additionally proposed to achieve further power consumption reduction from the 8T DCIM bit-cell-based 3-D DCIM. LUT-based 3-D DCIM achieves a 44% reduction in energy consumption compared to the conventional 10T DCIM bit-cell-based 3-D DCIM. Our findings suggest the significant implications for technology scaling below 1 nm in high-performance DCIM design.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1910-1919"},"PeriodicalIF":2.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519360","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information IEEE超大规模集成电路(VLSI)系统学报
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-03-21 DOI: 10.1109/TVLSI.2025.3549990
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information","authors":"","doi":"10.1109/TVLSI.2025.3549990","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3549990","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 4","pages":"C2-C2"},"PeriodicalIF":2.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10937138","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143676018","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A Design Methodology for Thermal Monitoring of Reusable Passive Interposers With RTDs 基于rtd的可重复使用无源中间体热监测设计方法
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-03-21 DOI: 10.1109/TVLSI.2025.3567824
Andreas Tsiougkos;Vasilis F. Pavlidis
{"title":"A Design Methodology for Thermal Monitoring of Reusable Passive Interposers With RTDs","authors":"Andreas Tsiougkos;Vasilis F. Pavlidis","doi":"10.1109/TVLSI.2025.3567824","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3567824","url":null,"abstract":"The heterogeneous integration underpinned by several advanced packaging options, such as passive interposers offers a promising direction for future integrated systems. However, the diversity of chiplets integrated in these systems can increase design complexity. A means to mitigate this situation is to reuse interposer fabrics. Consequently, reusable interposers should provide for signaling, power, and thermal issues. This work emphasizes thermal issues by introducing a novel and sufficiently accurate thermal monitoring strategy suitable for reusable passive interposers. The proposed strategy is based on metal resistance temperature detectors (RTDs) as sensors optimally arranged on a fixed rectangular grid supporting the reuse of passive interposers. A step-by-step methodology provides the design and allocation of the sensors across the interposer fabric under temperature precision and area constraints. Diverse benchmark scenarios are investigated with the proposed RTDs, which consume only <inline-formula> <tex-math>$33.6~mu text {W}$ </tex-math></inline-formula> with a footprint of only <inline-formula> <tex-math>$0.159~text {mm}^{2}$ </tex-math></inline-formula>. Simulation results show that the proposed methodology achieves six times (<inline-formula> <tex-math>$6times $ </tex-math></inline-formula>) improvement in mean absolute error (MAE) for reconstructed heatmaps over conventional chiplet-based sensors. This improvement is shown for different chiplet placements onto an interposer and for 2.5-D heterogeneous systems, where the integrated components do not include any or sufficient on-chip thermal sensors to provide the required temperature precision.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1803-1815"},"PeriodicalIF":2.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519294","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A TSV Misalignment-Based Repair Architecture in 3-D Chips 一种基于TSV错位的三维芯片修复结构
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-03-21 DOI: 10.1109/TVLSI.2025.3565650
Huaguo Liang;Jiahui Xiao;Xianrui Dou;Tianming Ni;Yingchun Lu;Zhengfeng Huang
{"title":"A TSV Misalignment-Based Repair Architecture in 3-D Chips","authors":"Huaguo Liang;Jiahui Xiao;Xianrui Dou;Tianming Ni;Yingchun Lu;Zhengfeng Huang","doi":"10.1109/TVLSI.2025.3565650","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3565650","url":null,"abstract":"As a critical component of 3-D integrated circuits (3D-ICs), the quality of through-silicon vias (TSVs) significantly impacts the yield and reliability of 3D-ICs, especially the clustered faults during manufacturing. In this article, a repair architecture based on TSV misalignment is proposed. This architecture achieves a higher repair rate by physically connecting the signal not to its closest TSV but only to the TSVs far away from each other. Experimental results show that the average repair rate of the proposed architecture increases by 13.42% compared to the existing repair architectures of the same type for clustered faults. Compared to the router-based architecture, the proposed architecture has a similar average repair rate with less than 0.15% difference in fewer than eight clustered faults, reducing the delay and MUX area overhead by 70.27% and 54.17%, respectively.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1816-1825"},"PeriodicalIF":2.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519386","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information 超大规模集成电路(VLSI)系统学报
IF 2.8 2区 工程技术
IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-03-21 DOI: 10.1109/TVLSI.2025.3549993
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2025.3549993","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3549993","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 4","pages":"C3-C3"},"PeriodicalIF":2.8,"publicationDate":"2025-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10937163","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143667706","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信