{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Society Information","authors":"","doi":"10.1109/TVLSI.2025.3557605","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3557605","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"C3-C3"},"PeriodicalIF":2.8,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10977653","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875189","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems Publication Information","authors":"","doi":"10.1109/TVLSI.2025.3557603","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3557603","url":null,"abstract":"","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 5","pages":"C2-C2"},"PeriodicalIF":2.8,"publicationDate":"2025-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10977654","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143875162","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Upscale Layer Acceleration on Existing AI Hardware","authors":"Vuk Vranjkovic;Predrag Teodorovic;Rastislav Struharik","doi":"10.1109/TVLSI.2025.3558946","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3558946","url":null,"abstract":"Upscaling layers are important components of modern deep learning networks but often pose computational challenges for hardware (HW) accelerators. This article addresses this issue by introducing a novel layer-replacement technique to efficiently process upscaling layers using existing hardware-supported operations like depthwise convolutions and maximum pooling. To minimize the number of replacement layers, we propose an efficient layer number reduction algorithm. Experimental results on four deep neural networks demonstrate a significant speedup ranging from <inline-formula> <tex-math>$1.58times $ </tex-math></inline-formula> to <inline-formula> <tex-math>$32.88times $ </tex-math></inline-formula> compared to the original HW/software (SW) execution approach, and from <inline-formula> <tex-math>$3.65times $ </tex-math></inline-formula> to <inline-formula> <tex-math>$19.21times $ </tex-math></inline-formula> compared to the software-only solution, with minimal hardware overhead (0.068% more field-programmable gate array (FPGA) look-up tables (LUTs)). Notably, our technique introduces no numerical errors and maintains comparable input data processing quality to the original network.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"1624-1637"},"PeriodicalIF":2.8,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117387","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A Switched-Based Slew Rate and Gain Boosting Parallel-Path Amplifier for Switched-Capacitor Applications","authors":"Javad Bagheri Asli;Alireza Saberkari;Atila Alvandpour","doi":"10.1109/TVLSI.2025.3557467","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3557467","url":null,"abstract":"A parallel-path amplifier (PPA) incorporating a switched-based slew rate and gain boosting stage as a feed-forward path, in parallel with a linear amplifier is introduced in this brief as an alternative to conventional analog amplifiers to achieve a high accuracy through the linear path and high slewing through the assisted feed-forward path. The feed-forward path employs a pre-amplifier, hysteresis-detector, and differential charge pumps, while the linear path includes a recycling folded-cascode amplifier. An analysis is performed to study the amplifier’s settling error with and without the feed-forward path, and also the trade-off between the dead-zone width of the hysteresis detector and the amplifier’s settling speed. The assisted feed-forward path has improved the slew rate <inline-formula> <tex-math>$times 2.5$ </tex-math></inline-formula>–800 V/<inline-formula> <tex-math>$mu $ </tex-math></inline-formula>s, effective GBW by 15%, and dc gain by 16 dB at the expense of adding <inline-formula> <tex-math>$187.5~mu $ </tex-math></inline-formula>A extra current consumption and <inline-formula> <tex-math>$1.25~mu $ </tex-math></inline-formula>m<sup>2</sup> extra silicon area. To prove the concept, the proposed amplifier is used as a multiplying digital-to-analog converter (MDAC) amplifier of an 8-bit pipeline analog-to-digital converter (ADC), and the ADC is fabricated in a 65-nm CMOS process. The results reveal that the spurious free dynamic range (SFDR) and signal-to-noise and distortion ratio (SNDR) performances are improved by 6–7 dB in the presence of the feed-forward path.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"1799-1802"},"PeriodicalIF":2.8,"publicationDate":"2025-04-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117249","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing Wireless PHY With Adaptive OFDM and Multiarmed Bandit Learning on Zynq System-on-Chip","authors":"Neelam Singh;Sumit J. Darak","doi":"10.1109/TVLSI.2025.3528865","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3528865","url":null,"abstract":"In this work, we present an intelligent and reconfigurable wireless physical layer (PHY) that dynamically adjusts the transmission parameters for a given radio frequency (RF) environment. The proposed PHY is based on orthogonal frequency division multiplexing (OFDM) and can dynamically augment OFDM with a finite impulse response (FIR) low-pass filter to improve the out-of-band emissions (OOBE). To make these adaptations intelligently, we employ multiarmed bandit (MAB)-based online learning algorithms, specifically upper confidence bound with control variate (UCB-CV). UCB-CV enhances traditional UCB by incorporating additional information such as interference level and transmit power, allowing it to manage interference more effectively. These algorithms are integrated into the PHY of an FPGA-based OFDM transceiver on the Zynq system-on-chip (SoC), facilitating real-time decision-making based on side-channel interference and other parameters. Our comparative analysis highlights the enhanced performance of the UCB-CV algorithm over the traditional UCB in terms of reducing the bit-error rate (BER) and managing interference more effectively. Unlike the traditional UCB, UCB-CV leverages side information through a control variate approach, incorporating the coefficient of variation (CV) into reward estimation to better handle interference. Additionally, we underline the advantages of filtered-OFDM (FOFDM) compared to standard OFDM. Notably, FOFDM significantly reduces OOBE by 20–75 dBW/Hz and improves BER. In environments with high interference, UCB-CV achieves a throughput improvement of 29.54% compared to UCB.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"1651-1664"},"PeriodicalIF":2.8,"publicationDate":"2025-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117345","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Xpikeformer: Hybrid Analog-Digital Hardware Acceleration for Spiking Transformers","authors":"Zihang Song;Prabodh Katti;Osvaldo Simeone;Bipin Rajendran","doi":"10.1109/TVLSI.2025.3552534","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3552534","url":null,"abstract":"The integration of neuromorphic computing and transformers through spiking neural networks (SNNs) offers a promising path to energy-efficient sequence modeling, with the potential to overcome the energy-intensive nature of the artificial neural network (ANN)-based transformers. However, the algorithmic efficiency of SNN-based transformers cannot be fully exploited on GPUs due to architectural incompatibility. This article introduces Xpikeformer, a hybrid analog-digital hardware architecture designed to accelerate SNN-based transformer models. The architecture integrates analog in-memory computing (AIMC) for feedforward and fully connected layers, and a stochastic spiking attention (SSA) engine for efficient attention mechanisms. We detail the design, implementation, and evaluation of Xpikeformer, demonstrating significant improvements in energy consumption and computational efficiency. Through image classification tasks and wireless communication symbol detection tasks, we show that Xpikeformer can achieve inference accuracy comparable to the GPU implementation of ANN-based transformers. Evaluations reveal that Xpikeformer achieves a <inline-formula> <tex-math>$13times $ </tex-math></inline-formula> reduction in energy consumption at approximately the same throughput as the state-of-the-art (SOTA) digital accelerator for ANN-based transformers. In addition, Xpikeformer achieves up to <inline-formula> <tex-math>$1.9times $ </tex-math></inline-formula> energy reduction compared to the optimal digital ASIC projection of SOTA SNN-based transformers.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"1596-1609"},"PeriodicalIF":2.8,"publicationDate":"2025-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Flex-PE: Flexible and SIMD Multiprecision Processing Element for AI Workloads","authors":"Mukul Lokhande;Gopal Raut;Santosh Kumar Vishvakarma","doi":"10.1109/TVLSI.2025.3553069","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3553069","url":null,"abstract":"The rapid evolution of artificial intelligence (AI) models, from deep neural networks (DNNs) to transformers/large-language models (LLMs), demands flexible hardware solutions to meet diverse execution needs across edge and cloud platforms. Existing accelerators lack unified support for multiprecision arithmetic and runtime-configurable activation functions (AFs). This work proposes Flex-PE, a single instruction, multiple data (SIMD)-enabled multiprecision processing element that efficiently integrates multiply-and-accumulate operations with configurable AFs using unified hardware, including Sigmoid, Tanh, ReLU, and SoftMax. The proposed design achieves throughput improvements of up to <inline-formula> <tex-math>$16times $ </tex-math></inline-formula> FxP4, <inline-formula> <tex-math>$8times $ </tex-math></inline-formula> FxP8, <inline-formula> <tex-math>$4times $ </tex-math></inline-formula> FxP16, and <inline-formula> <tex-math>$1times $ </tex-math></inline-formula> FxP32, with maximum hardware efficiency for both iterative and pipelined architectures. An area-efficient iterative Flex-PE-based SIMD systolic array reduces DMA reads by up to <inline-formula> <tex-math>$62times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$371times $ </tex-math></inline-formula> for input feature maps and weight filters in VGG-16, achieving 8.42 GOPS/W energy efficiency with minimal accuracy loss (<2%). Flex-PE scales from 4-bit edge inference to FxP8/16/32, supporting edge and cloud high-performance computing (HPC) while providing high-performance adaptable AI hardware with optimal precision, throughput, and energy efficiency.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"1610-1623"},"PeriodicalIF":2.8,"publicationDate":"2025-04-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117346","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Cost-Optimized Double-Node-Upset-Recovery Latch Designs With Aging Mitigation and Algorithm-Based Verification for Long-Term Robustness Enhancement","authors":"Aibin Yan;Changli Hu;Jing Li;Na Bai;Zhengfeng Huang;Tianming Ni;Girard Patrick;Xiaoqing Wen","doi":"10.1109/TVLSI.2025.3554117","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3554117","url":null,"abstract":"With the continuous advancement of CMOS technologies, soft errors, such as single-node upset (SNU) and double-node upset (DNU), caused by radiation in nanoscale integrated circuits, are becoming increasingly prominent. Meanwhile, transistor aging mitigation is indispensable for long-term robustness enhancement. First, to reduce the impact of radiation on circuits, we propose a novel DNU-recovery latch with low cost, namely, DURLC, only consisting of four dual-input C-elements (CEs) and four clock-gated input-split inverters for the storage of values. Second, we propose a DNU-recovery latch with moderate cost, namely, DURMC, based on seven CEs and four inverters, for convenience to optimize the latch to alleviate aging. The proposed DNU-recovery latch with mitigated aging is called DURMA. The latch employs a high-speed path to reduce delay without sacrificing performance when mitigating aging issues. Finally, we propose an algorithm-based verification method to validate the DNU recovery of the proposed latches. The simulation results show that, compared with the state-of-the-art robust latches, the proposed latches have the advantages of DNU recovery with moderate and even low cost, and meanwhile, aging is effectively mitigated for the DURMA latch.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"1765-1773"},"PeriodicalIF":2.8,"publicationDate":"2025-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117272","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jaewon Lee;Seoyoung Jang;Yujin Choi;Donggeon Kim;Matthias Braendli;Thomas Morf;Marcel Kossel;Pier-Andrea Francese;Gain Kim
{"title":"A 2-Lane DAC-/ADC-Based 2 × 2 MIMO PAM-4 MMSE-DFE Wireline Transceiver With FEXT Cancellation on RFSoC Platform","authors":"Jaewon Lee;Seoyoung Jang;Yujin Choi;Donggeon Kim;Matthias Braendli;Thomas Morf;Marcel Kossel;Pier-Andrea Francese;Gain Kim","doi":"10.1109/TVLSI.2025.3553400","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3553400","url":null,"abstract":"This article presents a 2-lane <inline-formula> <tex-math>$2 times 2$ </tex-math></inline-formula> multiple-input, multiple-output (MIMO) 4-level pulse amplitude modulation (PAM-4) minimum mean-squared-error (MMSE)-decision-feedback equalizer (DFE) with far-end crosstalk (FEXT) cancellation for digital-to-analog converter (DAC)-/analog-to-digital converter (ADC)-based high-speed serial links. The receiver (RX) datapath is designed with a 15-tap MIMO feedforward equalizer (FFE) and a one-tap MIMO DFE with the least mean square (LMS), enabling adaptation to channel variation while maintaining the MMSE setting. The RX digital signal processor (DSP) place and route (PnR) in a 28-nm CMOS is estimated to consume 201 mW/lane at a 56-Gb/s/lane data rate while occupying a 0.5-mm<sup>2</sup>/lane silicon area. We further implement a real-time evaluation platform to verify the functionality of the MIMO PAM-4 MMSE-DFE with rapid bit-error-rate (BER) testing on RFSoC. The measurement result demonstrates that the MIMO MMSE-DFE significantly improves BER performance from 2.75e<sup>−3</sup> to 1.31e<sup>−7</sup> compared with equalization without FEXT cancellation when communicating over a channel exhibiting 12.4-dB insertion loss (IL) and 13.2-dB IL-to-crosstalk ratio (ICR) at Nyquist.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"1570-1581"},"PeriodicalIF":2.8,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yunpeng Song;Yina Lv;Wentong Li;Jialin Liu;Liang Shi
{"title":"Revisiting Multiple ECC on High-Density NAND Flash memory","authors":"Yunpeng Song;Yina Lv;Wentong Li;Jialin Liu;Liang Shi","doi":"10.1109/TVLSI.2025.3551400","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3551400","url":null,"abstract":"Three-dimensional <sc>nand</small> flash memory using the advanced multibit-per-cell technique is widely adopted due to its high density. However, it faces the problem of deteriorating read performance and energy consumption due to decreased reliability. Low-density parity-check code (LDPC) is typically adopted as an error correction code (ECC) to encode data and provide fault tolerance. To reduce the cost, LDPC with a high code rate is always adopted. However, LDPC will lead to read retry operations when the accessed data are not successfully decoded, and such retry-induced performance degradation is serious, especially for modern high-density flash memory. In this work, a reliability-aware differential ECC (READECC) approach is proposed to reduce redundancy protection and storage cost of LDPC with a low code rate and optimize the read performance. The basic idea is to adopt LDPC with a suitable code rate considering both data access characteristics and flash reliability characteristics. First, hot reads are identified based on the frequency of being accessed. Second, based on the reliability variation characteristics, the life of flash memory is divided into three reliability periods. As the reliability period shifts, the code rate of the LDPC adjusts adaptively to minimize redundancy protection. Third, an adaptive-sized logical page approach is further proposed to support LDPC with strong error correction capability (a low code rate) with a low storage cost. Through careful design and evaluation on 3-D triple-level-cell <sc>nand</small> flash memory, READECC achieves encouraging optimizations with a negligible cost.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"1679-1692"},"PeriodicalIF":2.8,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}