IEEE Transactions on Very Large Scale Integration (VLSI) Systems最新文献_第8页

A 2-Lane DAC-/ADC-Based 2 × 2 MIMO PAM-4 MMSE-DFE Wireline Transceiver With FEXT Cancellation on RFSoC Platform RFSoC平台上基于2通道DAC / adc的2 × 2 MIMO PAM-4 MMSE-DFE有线收发器

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-04-02 DOI: 10.1109/TVLSI.2025.3553400

Jaewon Lee;Seoyoung Jang;Yujin Choi;Donggeon Kim;Matthias Braendli;Thomas Morf;Marcel Kossel;Pier-Andrea Francese;Gain Kim

{"title":"A 2-Lane DAC-/ADC-Based 2 × 2 MIMO PAM-4 MMSE-DFE Wireline Transceiver With FEXT Cancellation on RFSoC Platform","authors":"Jaewon Lee;Seoyoung Jang;Yujin Choi;Donggeon Kim;Matthias Braendli;Thomas Morf;Marcel Kossel;Pier-Andrea Francese;Gain Kim","doi":"10.1109/TVLSI.2025.3553400","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3553400","url":null,"abstract":"This article presents a 2-lane <inline-formula> <tex-math>$2 times 2$ </tex-math></inline-formula> multiple-input, multiple-output (MIMO) 4-level pulse amplitude modulation (PAM-4) minimum mean-squared-error (MMSE)-decision-feedback equalizer (DFE) with far-end crosstalk (FEXT) cancellation for digital-to-analog converter (DAC)-/analog-to-digital converter (ADC)-based high-speed serial links. The receiver (RX) datapath is designed with a 15-tap MIMO feedforward equalizer (FFE) and a one-tap MIMO DFE with the least mean square (LMS), enabling adaptation to channel variation while maintaining the MMSE setting. The RX digital signal processor (DSP) place and route (PnR) in a 28-nm CMOS is estimated to consume 201 mW/lane at a 56-Gb/s/lane data rate while occupying a 0.5-mm2/lane silicon area. We further implement a real-time evaluation platform to verify the functionality of the MIMO PAM-4 MMSE-DFE with rapid bit-error-rate (BER) testing on RFSoC. The measurement result demonstrates that the MIMO MMSE-DFE significantly improves BER performance from 2.75e−3 to 1.31e−7 compared with equalization without FEXT cancellation when communicating over a channel exhibiting 12.4-dB insertion loss (IL) and 13.2-dB IL-to-crosstalk ratio (ICR) at Nyquist.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"1570-1581"},"PeriodicalIF":2.8,"publicationDate":"2025-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117385","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Revisiting Multiple ECC on High-Density NAND Flash memory 高密度NAND快闪记忆体的多重ECC研究

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-04-01 DOI: 10.1109/TVLSI.2025.3551400

Yunpeng Song;Yina Lv;Wentong Li;Jialin Liu;Liang Shi

{"title":"Revisiting Multiple ECC on High-Density NAND Flash memory","authors":"Yunpeng Song;Yina Lv;Wentong Li;Jialin Liu;Liang Shi","doi":"10.1109/TVLSI.2025.3551400","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3551400","url":null,"abstract":"Three-dimensional <sc>nand flash memory using the advanced multibit-per-cell technique is widely adopted due to its high density. However, it faces the problem of deteriorating read performance and energy consumption due to decreased reliability. Low-density parity-check code (LDPC) is typically adopted as an error correction code (ECC) to encode data and provide fault tolerance. To reduce the cost, LDPC with a high code rate is always adopted. However, LDPC will lead to read retry operations when the accessed data are not successfully decoded, and such retry-induced performance degradation is serious, especially for modern high-density flash memory. In this work, a reliability-aware differential ECC (READECC) approach is proposed to reduce redundancy protection and storage cost of LDPC with a low code rate and optimize the read performance. The basic idea is to adopt LDPC with a suitable code rate considering both data access characteristics and flash reliability characteristics. First, hot reads are identified based on the frequency of being accessed. Second, based on the reliability variation characteristics, the life of flash memory is divided into three reliability periods. As the reliability period shifts, the code rate of the LDPC adjusts adaptively to minimize redundancy protection. Third, an adaptive-sized logical page approach is further proposed to support LDPC with strong error correction capability (a low code rate) with a low storage cost. Through careful design and evaluation on 3-D triple-level-cell <sc>nand flash memory, READECC achieves encouraging optimizations with a negligible cost.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"1679-1692"},"PeriodicalIF":2.8,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BF PUF: A Modeling Attack-Resistant Strong PUF Based on Bent Functions BF PUF：一种基于弯曲函数的建模抗攻击强PUF

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-03-30 DOI: 10.1109/TVLSI.2025.3569587

Zhengfeng Huang;Fansheng Zeng;Yanqiao Chi;Yankun Lin;Yingchun Lu;Huaguo Liang;Jingchang Bian;Yiming Ouyang;Tianming Ni;Xiaoqing Wen

{"title":"BF PUF: A Modeling Attack-Resistant Strong PUF Based on Bent Functions","authors":"Zhengfeng Huang;Fansheng Zeng;Yanqiao Chi;Yankun Lin;Yingchun Lu;Huaguo Liang;Jingchang Bian;Yiming Ouyang;Tianming Ni;Xiaoqing Wen","doi":"10.1109/TVLSI.2025.3569587","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3569587","url":null,"abstract":"Strong physical unclonable functions (PUFs) are promising circuits for lightweight Internet of Things (IoT) authentication and security. However, existing strong PUFs exhibit very low cryptographic nonlinearity (NL), making them vulnerable to machine learning (ML) modeling and cryptanalytic attack. To address this issue, we propose the Bent function PUF (BF PUF) based on Maiorana-McFarland (M-M) constructed Bent functions, which obfuscates the responses of the strong PUF to enhance resistance against modeling attacks. The core idea is to employ the M-M construction method for Bent functions to ensure maximum cryptographic NL to resist modeling attacks. A Feistel network is configured using weak PUF responses as keys to achieve device-specific and unpredictable mappings of input challenges while meeting the requirements of the M-M Bent function construction. A Python-based model of the BF PUF was developed, and simulation results indicate that the cryptographic NL of the proposed BF PUF outperforms <italic>k-<sc>xor arbiter PUFs (APUFs) (<inline-formula> <tex-math>${k} =2$ </tex-math></inline-formula>, 4, 6). The proposed BF PUF was also implemented and evaluated on the FPGA hardware platform. The experimental results show that under modeling attacks using four ML algorithms—logistic regression (LR), artificial neural networks (ANNs), deep neural networks (DNNs), and covariance matrix adaptation evolution strategies (CMA-ES)—the best prediction accuracy under these four modeling attack algorithms is 52.60%. The reliability under temperature fluctuations ranging from <inline-formula> <tex-math>$- 10~^{circ }$ </tex-math></inline-formula>C to <inline-formula> <tex-math>$80~^{circ }$ </tex-math></inline-formula>C is between 84.20% and 99.78%.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2299-2311"},"PeriodicalIF":2.8,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144705275","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Full-Array Boolean Logic CIM Macro With Self-Recycling 10T-SRAM Cell for AES Systems AES系统中具有自回收10T-SRAM单元的全阵列布尔逻辑CIM宏

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-03-29 DOI: 10.1109/TVLSI.2025.3572140

Xin Li;Ying Pan;Qian Jin;Lintao Chen;Yang Lou;Baofa Wu;Jiajun Long;Yongliang Zhou;Chunyu Peng;Xiulong Wu;Zhiting Lin

{"title":"Full-Array Boolean Logic CIM Macro With Self-Recycling 10T-SRAM Cell for AES Systems","authors":"Xin Li;Ying Pan;Qian Jin;Lintao Chen;Yang Lou;Baofa Wu;Jiajun Long;Yongliang Zhou;Chunyu Peng;Xiulong Wu;Zhiting Lin","doi":"10.1109/TVLSI.2025.3572140","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3572140","url":null,"abstract":"Computing in memory (CIM), which alleviates the need to transfer a large amount of data between processor and memory, significantly reducing latency and energy consumption, is a promising new computing architecture for addressing the von Neumann bottleneck problem. This article proposes a CIM array structure composed of self-recycling 10T static random access memory (SRAM) cells, which can realize orthogonal data writing, and multiple Boolean logical operations for the entire array. The self-recycling and full-array activation characteristics are extremely suitable for accelerating diverse data processing algorithms such as the Advanced Encryption Standard (AES). A 4-kb SRAM is implemented in 55-nm CMOS technology to verify the effectiveness of the design. Compared with other state-of-the-art architectures, the throughput and the operating frequency of the proposed CIM macro are increased to 843 GOPS/kb (<inline-formula> <tex-math>$2.64times $ </tex-math></inline-formula>) and 823.7 MHz (<inline-formula> <tex-math>$2.6times $ </tex-math></inline-formula>), respectively. The energy efficiency reaches 246.9 TOPS/W. When applied to the AES, the energy consumption is 35.77% less than the digital CIM architecture that is not self-recycling.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2214-2224"},"PeriodicalIF":2.8,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Synthesis of Analog and Mixed-Signal Circuits on a Programmable Array 可编程阵列上模拟和混合信号电路的合成

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-03-29 DOI: 10.1109/TVLSI.2025.3553538

Ziyi Chen;Ioannis Savidis

{"title":"Synthesis of Analog and Mixed-Signal Circuits on a Programmable Array","authors":"Ziyi Chen;Ioannis Savidis","doi":"10.1109/TVLSI.2025.3553538","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3553538","url":null,"abstract":"In this article, a novel field-programmable analog array (FPAA) has been developed for the configurable implementation of various analog circuits. The proposed architecture not only supports system-level reconfiguration but also enables transistor-level programmability. The FPAA is comprised of a <inline-formula> <tex-math>$3times 4$ </tex-math></inline-formula> configurable analog block (CAB) array, with a single configurable logic block (CLB) added to each column to allow for the programming of digital circuits. Passive devices, including programmable capacitors and resistors, and active transistor pairs (TPs), are utilized to implement both continuous-time and discrete-time circuits. A placement algorithm is developed that efficiently maps analog circuits onto the FPAA fabric by finding the optimal vertical and horizontal locations for the assignment of transistors. In addition, to reduce the complexity of placing devices on the fabric, a technique is developed that matches TPs in the same vertical level to predefined topologies in a library. Routers are included to connect devices implemented on the FPAA fabric. The proposed FPAA occupies an area of 4 mm2 in a TSMC 65-nm fabrication process. The smaller circuits implemented on the FPAA fabric include a folded-cascode amplifier, a strongArm comparator, a continuous-time integrator, and a switch-capacitor integrator. The larger analog and mixed-signal circuits implemented on the FPAA fabric include a four-stage pipeline analog-to-digital converter (ADC) and a first-order delta-sigma modulator. The programmed folded-cascode amplifier exhibits a tunable gain of 28.3 dB to 34.8 dB and a programmable 3-dB bandwidth of 3.3 MHz to 5.3 MHz. The configured comparator provides a resolution of less than 3 mV when comparing two signals. The implemented first-order delta-sigma modulator operates at a frequency of 15 MHz and provides an effective number of bits (ENOBs) of 6.8 when utilizing an oversampling ratio of <inline-formula> <tex-math>$128times $ </tex-math></inline-formula>. The configured pipeline ADC provides an ENOB of 3.7 for a sampling frequency of 15 MHz.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"1920-1933"},"PeriodicalIF":2.8,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519389","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Supply Noise-Insensitive Ring DCO With a Self-Biased Shunt Regulator Array in Wide-Range Digital PLL 宽量程数字锁相环中带自偏置分流稳压阵列的电源噪声不敏感环形DCO

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-03-29 DOI: 10.1109/TVLSI.2025.3572883

Kyungmin Baek;Jiho Kim;Kahyun Kim;Deog-Kyoon Jeong;Min-Seong Choo

{"title":"A Supply Noise-Insensitive Ring DCO With a Self-Biased Shunt Regulator Array in Wide-Range Digital PLL","authors":"Kyungmin Baek;Jiho Kim;Kahyun Kim;Deog-Kyoon Jeong;Min-Seong Choo","doi":"10.1109/TVLSI.2025.3572883","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3572883","url":null,"abstract":"This brief proposes a digital phase-locked loop (DPLL) with a power supply noise (PSN) regulated ring-type digitally controlled oscillator (DCO) using an nMOS shunt regulator array. The proposed nMOS array dynamically detects the PSN and creates a pathway, channeling the PSN forwarded through the digitally controlled resistor (DCR) directly to the ground. To support the proposed power supply noise compensation (PNC) technique in wide-range operation, the output bits from the digital loop filter (DLF) control not only the DCR but also the total transconductance of the nMOS array. The supply-sensing amplifier (SSA) between the supply and the gates of the nMOS array amplifies supply noise to lower the voltage headroom, allowing the DCO to run faster. Fabricated in 40-nm CMOS technology, the prototype DPLL demonstrates an rms jitter of 1.27 ps under 1 MHz, 20-mVPP sinusoidal noise, while the rms jitter without the regulator is measured as 3.26 ps. The total power consumption and area occupation of the DPLL are 13.5 mW and 0.066 mm2, respectively. The proposed scheme for PNC contributes only 1.90 mW and 0.0017 mm2, representing 14.1% and 2.8% of the total, respectively.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2349-2353"},"PeriodicalIF":2.8,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144705277","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Backside Active Power Delivery With Hybrid DC–DC Converter Enabled by Amorphous Oxide Semiconductor Transistors 基于非晶氧化物半导体晶体管的混合DC-DC变换器的背面有源功率输出

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-03-28 DOI: 10.1109/TVLSI.2025.3570078

Jungyoun Kwak;Sunbin Deng;Junmo Lee;Suman Datta;Shimeng Yu

{"title":"Backside Active Power Delivery With Hybrid DC–DC Converter Enabled by Amorphous Oxide Semiconductor Transistors","authors":"Jungyoun Kwak;Sunbin Deng;Junmo Lee;Suman Datta;Shimeng Yu","doi":"10.1109/TVLSI.2025.3570078","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3570078","url":null,"abstract":"The increasing demand for energy-efficient computing has created the need for advanced power management solutions. Backside power delivery network (BSPDN) has been introduced in the industry for 2-nm node with passive wires. In this work, we propose adding active components (power transistors) to the backside of silicon in a back-end-of-line (BEOL)-compatible fabrication process. The goal is to enable 12–0.7-V voltage downconversion at the backside of silicon (near the point of load, i.e., the frontside logic compute die) to minimize the IR drop and improve overall system-level conversion efficiency. This work leverages a hybrid monolithic 3-D (M3D)dc-dc converter architecture combining switched-capacitor (SC) and synchronous buck converter topologies with BEOL-compatible active and passive devices. The design employs amorphous tungsten-doped indium oxide (IWO) transistors, which offer high breakdown voltage and tunable threshold voltages, supporting both enhancement and depletion modes for efficient switching. With the experimentally calibrated compact models, the simulated hybrid converter design achieves 12–0.7-V conversion with a peak efficiency of 95.6% at a power density of 330 mW/mm2, demonstrating the feasibility of M3D SC dc-dc converters for next-generation power management in high-performance edge devices.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2153-2162"},"PeriodicalIF":2.8,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144704904","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Area-Energy-Efficient 64–2048 Point FFT With Approximate Plane-Fitting Complex Multipliers 具有近似平面拟合复乘子的64-2048点区域节能FFT

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-03-27 DOI: 10.1109/TVLSI.2025.3550470

Weiwei Shi;Jiasheng Wu;Yida Yuan;Zhihong Mo;Chaoyuan Wu;Jiangwei He

{"title":"An Area-Energy-Efficient 64–2048 Point FFT With Approximate Plane-Fitting Complex Multipliers","authors":"Weiwei Shi;Jiasheng Wu;Yida Yuan;Zhihong Mo;Chaoyuan Wu;Jiangwei He","doi":"10.1109/TVLSI.2025.3550470","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3550470","url":null,"abstract":"As a key component of fast Fourier transform (FFT), the complex multiplier (CM) includes twiddle factor generation and corresponding multiplication. This brief proposes an tailored approach for approximating CM functionality by employing an adapted piecewise-plane-fitting technique, effectively replacing the conventional look-up-table-based twiddle generation and exact multipliers by shift-and-add calculation. Numerical binary calculation analysis and simulations are conducted to achieve an optimal tradeoff among accuracy, circuit complexity, power, and delay. Based on 45-nm CMOS, logic synthesis results demonstrate significant improvements, with area, power, and delay reductions of 64.18%, 64.98%, and 19.77%, respectively. With optimizations on logic structures, the complete design of the 64–2048 point FFT has efficiently adopted the proposed CM with evident improvement. The proposed FFT outperforms other reconfigurable FFT designs in terms of normalized area reduction over 55.53% and normalized energy improvement over 21.51%. In field-programmable gate array (FPGA) implementation, the proposed FFT has significantly more savings compared with the exact FFT. In practice, the approximate FFT output results’ PSNR ranges from 56 to 83 dB with competent accuracy in typical signal processing.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 7","pages":"2034-2038"},"PeriodicalIF":2.8,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144519295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An Area and Energy-Efficient Systolic Array Accelerator Architecture for Deep Neural Networks Using Stochastic Computing 基于随机计算的深度神经网络收缩阵列加速器结构

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-03-24 DOI: 10.1109/TVLSI.2025.3550786

Jingwei Zhu;Jingguo Wu;Zongru Yang;Yu Jiang;Yun Chen

{"title":"An Area and Energy-Efficient Systolic Array Accelerator Architecture for Deep Neural Networks Using Stochastic Computing","authors":"Jingwei Zhu;Jingguo Wu;Zongru Yang;Yu Jiang;Yun Chen","doi":"10.1109/TVLSI.2025.3550786","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3550786","url":null,"abstract":"Deep neural networks (DNNs) are widely used to handle various intelligent tasks. With the increased model size, the DNNs’ hardware accelerators are challenging the higher area overhead and energy consumption. Stochastic computing (SC) has recently been considered for implementing DNNs and reducing hardware consumption. However, many current SC-based DNN accelerators fail to balance accuracy, performance, and resource overhead. In addition, their limited scalability and flexibility restrict their use in edge devices. In this article, we design an area and energy-efficient DNN accelerator architecture using SC. We propose an SC-binary hybrid processing unit with piecewise shift compensation without significant additional hardware overhead increment to improve the SC accuracy. To balance performance and resource overhead, we conduct a design space exploration (DSE) from an overall architectural perspective. An experimental platform with both software and hardware for SC-based DNNs is established. The software simulation results demonstrate that the best accuracy of the designed SC-DNN on the CIFAR-10 is 91.9%, which is 3.2% higher than that of the previous SC-DNN work. The VLSI implementation of the hardware is synthesized using the TSMC 28-nm CMOS process. Results show that compared to the binary computing counterpart, our design achieves <inline-formula> <tex-math>$2.7times $ </tex-math></inline-formula> area efficiency and <inline-formula> <tex-math>$3.4times $ </tex-math></inline-formula> energy efficiency. Compared to other SC-DNN accelerator designs, our design can provide <inline-formula> <tex-math>$5.3times $ </tex-math></inline-formula> area efficiency and <inline-formula> <tex-math>$7.3times $ </tex-math></inline-formula> energy efficiency.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 6","pages":"1582-1595"},"PeriodicalIF":2.8,"publicationDate":"2025-03-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144117276","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A Sub-0.9-ps Static Phase Offset 500 MHz Delay-Locked Loop With a Large Gain Phase Detector 带大增益鉴相器的低于0.9 ps静态相位偏移500mhz延时锁相环

IF 2.8 2区工程技术

IEEE Transactions on Very Large Scale Integration (VLSI) Systems Pub Date : 2025-03-23 DOI: 10.1109/TVLSI.2025.3566739

Jingjing Liu;Ruihuang Wu;Haoning Sun;Bingjun Xiong;Feng Yan;Kangkang Sun;Zhipeng Li;Jian Guan

{"title":"A Sub-0.9-ps Static Phase Offset 500 MHz Delay-Locked Loop With a Large Gain Phase Detector","authors":"Jingjing Liu;Ruihuang Wu;Haoning Sun;Bingjun Xiong;Feng Yan;Kangkang Sun;Zhipeng Li;Jian Guan","doi":"10.1109/TVLSI.2025.3566739","DOIUrl":"https://doi.org/10.1109/TVLSI.2025.3566739","url":null,"abstract":"This article presents an analog delay-locked loop (DLL) designed for high-precision measurement applications, featuring low static phase offset (SPO) and fast locking speed, such as time-to-digital converters (TDCs) and analog-to-digital converters (ADCs). A large gain and dead-zone free phase detector (PD) is proposed. When the DLL reaches the locked state, the phase error between the two input signals of the PD can be reduced to 0.53 ps (0.095°), which has an 18-time improvement compared to the conventional DLL. Therefore, the SPO of the entire DLL can be effectively reduced to be less than 0.87 ps. Furthermore, the auxiliary circuit, consisting of a large phase difference detector (LPDD) and fast-adjusting branches (FABs), accelerates the DLL’s locking process to 42 clock cycles and improves the locking speed by 4.1 times. Designed by a standard 180 nm CMOS technology, the DLL occupies an area of <inline-formula> <tex-math>$106.1times 93.3~mu $ </tex-math></inline-formula>m. It achieves low power consumption of 1.89 mW at 500 MHz, and the root mean square (rms) jitter and P-P jitter are 1.01 and 6.26 ps, respectively.","PeriodicalId":13425,"journal":{"name":"IEEE Transactions on Very Large Scale Integration (VLSI) Systems","volume":"33 8","pages":"2143-2152"},"PeriodicalIF":2.8,"publicationDate":"2025-03-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144705290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0