IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems最新文献_第6页

HaloFL: Efficient Heterogeneity-Aware Federated Learning Through Optimal Submodel Extraction and Dynamic Sparse Adjustment 基于最优子模型提取和动态稀疏调整的高效异构感知联邦学习

IF 2.9 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-03-04 DOI: 10.1109/TCAD.2025.3548003

Zirui Lian;Qianyue Cao;Chao Liang;Jing Cao;Zongwei Zhu;Zhi Yang;Cheng Ji;Changlong Li;Xuehai Zhou

{"title":"HaloFL: Efficient Heterogeneity-Aware Federated Learning Through Optimal Submodel Extraction and Dynamic Sparse Adjustment","authors":"Zirui Lian;Qianyue Cao;Chao Liang;Jing Cao;Zongwei Zhu;Zhi Yang;Cheng Ji;Changlong Li;Xuehai Zhou","doi":"10.1109/TCAD.2025.3548003","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3548003","url":null,"abstract":"Federated learning (FL) is an advanced framework that enables collaborative training of machine learning models across edge devices. An effective strategy to enhance training efficiency is to allocate the optimal submodel based on each device’s resource capabilities. However, system heterogeneity significantly increases the difficulty of allocating submodel parameter budgets appropriately for each device, leading to the straggler problem. Meanwhile, data heterogeneity complicates the selection of the optimal submodel structure for specific devices, thereby impacting training performance. Furthermore, the dynamic nature of edge environments, such as fluctuations in network communication and computational resources, exacerbates these challenges, making it even more difficult to precisely extract appropriately sized and structured submodels from the global model. To address the challenges in heterogeneous training environments, we propose an efficient FL framework, namely, HaloFL. The framework dynamically adjusts the structure and parameter budget of submodels during training by evaluating three dimensions: 1) model-wise performance; 2) layer-wise performance; and 3) unit-wise performance. First, we design a data-aware model unit importance evaluation method to determine the optimal submodel structure for different data distributions. Next, using this evaluation method, we analyze the importance of model layers and reallocate parameters from noncritical layers to critical layers within a fixed parameter budget, further optimizing the submodel structure. Finally, we introduce a resource-aware dual-UCB multiarmed bandit agent, which dynamically adjusts the total parameter budget of submodels according to changes in the training environment, allowing the framework to better adapt to the performance differences of heterogeneous devices. Experimental results demonstrate that HaloFL exhibits outstanding efficiency in various dynamic and heterogeneous scenarios, achieving up to a 14.80% improvement in accuracy and a <inline-formula> <tex-math>$3.06times $ </tex-math></inline-formula> speedup compared to existing FL frameworks.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 9","pages":"3518-3531"},"PeriodicalIF":2.9,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144887702","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Circuit Partitioning and Transmission Cost Optimization in Distributed Quantum Circuits 分布式量子电路的电路划分与传输成本优化

IF 2.9 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-03-04 DOI: 10.1109/TCAD.2025.3547812

Xinyu Chen;Zilu Chen;Pengcheng Zhu;Xueyun Cheng;Zhijin Guan

{"title":"Circuit Partitioning and Transmission Cost Optimization in Distributed Quantum Circuits","authors":"Xinyu Chen;Zilu Chen;Pengcheng Zhu;Xueyun Cheng;Zhijin Guan","doi":"10.1109/TCAD.2025.3547812","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3547812","url":null,"abstract":"Given the limitations on the number of qubits in current noisy intermediate-scale quantum (NISQ) devices, the implementation of large-scale quantum algorithms on such devices is challenging, prompting research into distributed quantum computing. This article focuses on the issue of excessive communication complexity in distributed quantum computing based on the quantum circuit model. To reduce the number of quantum state transmissions, i.e., the transmission cost, in distributed quantum circuits, a circuit partitioning method based on the quadratic unconstrained binary optimization (QUBO) model is proposed, coupled with the lookahead method for transmission cost optimization. Initially, the problem of distributed quantum circuit partitioning is transformed into a graph minimum cut problem. The QUBO model, which can be accelerated by quantum annealing algorithms, is introduced to minimize the number of quantum gates between quantum processing units (QPUs) and the transmission cost. Subsequently, the dynamic lookahead strategy for the selection of transmission qubits is proposed to optimize the transmission cost in distributed quantum circuits. Finally, through numerical simulations, the impact of different circuit partitioning indicators on the transmission cost is explored, and the proposed method is evaluated on benchmark circuits. Experimental results demonstrate that the proposed circuit partitioning method has a shorter runtime compared with current circuit partitioning methods. Additionally, the transmission cost optimized by the proposed method is significantly lower than that of current transmission cost optimization methods, achieving noticeable improvements across different numbers of partitions.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 9","pages":"3350-3362"},"PeriodicalIF":2.9,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144887889","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Advancing Neuromorphic Architecture Toward Emerging Spiking Neural Network on FPGA 面向新兴脉冲神经网络的FPGA神经形态结构的推进

IF 2.9 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-03-03 DOI: 10.1109/TCAD.2025.3547275

Yingxue Gao;Teng Wang;Yang Yang;Lei Gong;Xianglan Chen;Chao Wang;Xi Li;Xuehai Zhou

{"title":"Advancing Neuromorphic Architecture Toward Emerging Spiking Neural Network on FPGA","authors":"Yingxue Gao;Teng Wang;Yang Yang;Lei Gong;Xianglan Chen;Chao Wang;Xi Li;Xuehai Zhou","doi":"10.1109/TCAD.2025.3547275","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3547275","url":null,"abstract":"Spiking neural networks (SNNs) replace the multiply-and-accumulate operations in traditional artificial neural networks (ANNs) with lightweight mask-and-accumulate operations, achieving greater performance. Existing SNN architectures are primarily designed based on fully-connected or convolutional SNN topologies and still struggle with low task accuracy, limiting their practical applications. Recently, transformer SNN (TSNN) models have shown promise in matching the accuracy of nonspiking ANNs and demonstrated potential application prospects. However, their diverse computation pattern and sophisticated network structure with high computation and memory footprints impede their efficient deployment. Thus, in this work, we move our attention to heterogeneous architecture design and propose SpikeTA, the first neuromorphic hardware accelerator explicitly designed for the TSNN model on FPGA. First, SpikeTA enables parameterizable hardware engines (HEs) designed for the network layers in TSNN, enhancing compatibility between HEs and network layers. Second, SpikeTA optimizes arithmetic operations between binary spikes and synaptic weights by presenting a DSP-efficient addition tree. By analyzing the inherent data characteristics, SpikeTA further introduces a depth-aware buffer management strategy to provide sufficient access ports. Third, SpikeTA employs a streaming dataflow mapping to optimize data transmission granularity and leverages a split-engine dataflow mapping to facilitate pipelined latency balancing. Experimental results demonstrate that SpikeTA achieves significant performance speedups of <inline-formula> <tex-math>$140.73times $ </tex-math></inline-formula>–<inline-formula> <tex-math>$1023.53times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$2.97times $ </tex-math></inline-formula>–<inline-formula> <tex-math>$7.29times $ </tex-math></inline-formula> over architectures running on the AMD EPYC 7542 CPU and NVIDIA A100 GPU, respectively. SpikeTA also outperforms state-of-the-art SNN and Transformer accelerators by <inline-formula> <tex-math>$2.79times $ </tex-math></inline-formula> and <inline-formula> <tex-math>$2.66times $ </tex-math></inline-formula> in architecture performance while achieving a peak performance of 28.99 TOPs.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 9","pages":"3465-3478"},"PeriodicalIF":2.9,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144887747","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

AccSiM: State-Aware Simulation Acceleration for Simulink Models AccSiM：状态感知仿真加速Simulink模型

IF 2.9 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-02-28 DOI: 10.1109/TCAD.2025.3546879

Yifan Cheng;Zehong Yu;Zhuo Su;Ting Chen;Xiaosong Zhang;Yu Jiang

{"title":"AccSiM: State-Aware Simulation Acceleration for Simulink Models","authors":"Yifan Cheng;Zehong Yu;Zhuo Su;Ting Chen;Xiaosong Zhang;Yu Jiang","doi":"10.1109/TCAD.2025.3546879","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3546879","url":null,"abstract":"Simulink has been widely used in embedded software development, which supports simulation to validate the correctness of models. However, as the scale and complexity of models in industrial applications grow, it is time-consuming for the simulation engine of Simulink to achieve high coverage and detect potential errors, especially accumulative errors. In this article, we propose A<sc>ccS<sc>iM, an accelerating model simulation method for Simulink models via code generation. A<sc>ccS<sc>iM generates simulation functionality code for Simulink models through simulation oriented instrumentation, including runtime data collection, data diagnosis, and state-aware acceleration. The final simulation code is constructed by composing all the instrumentation code with actor code generated from a predefined template library and integrating test cases import. After compiling and executing the code, A<sc>ccS<sc>iM generates simulation results including coverage and diagnostic information. We implemented A<sc>ccS<sc>iM and evaluated it on several benchmark Simulink models. Compared to Simulink’s simulation engine, A<sc>ccS<sc>iM shows a <inline-formula> <tex-math>$215.3times $ </tex-math></inline-formula> improvement in simulation efficiency, significantly reduces the time required for detecting errors. Furthermore, through the state-aware acceleration method, A<sc>ccS<sc>iM yielded an additional <inline-formula> <tex-math>$2.8{times }$ </tex-math></inline-formula> speedup. A<sc>ccS<sc>iM also achieved greater coverage within equivalent time.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 9","pages":"3289-3302"},"PeriodicalIF":2.9,"publicationDate":"2025-02-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144887745","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

HotReRAM: A Performance-Power–Thermal Simulation Framework for ReRAM-Based Caches hotream：基于reram的缓存的性能-功率-热模拟框架

IF 2.9 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-02-27 DOI: 10.1109/TCAD.2025.3546855

Shounak Chakraborty;Thanasin Bunnam;Jedsada Arunruerk;Sukarn Agarwal;Shengqi Yu;Rishad Shafik;Magnus Själander

{"title":"HotReRAM: A Performance-Power–Thermal Simulation Framework for ReRAM-Based Caches","authors":"Shounak Chakraborty;Thanasin Bunnam;Jedsada Arunruerk;Sukarn Agarwal;Shengqi Yu;Rishad Shafik;Magnus Själander","doi":"10.1109/TCAD.2025.3546855","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3546855","url":null,"abstract":"This article proposes a comprehensive thermal modeling and simulation framework, HotReRAM, for resistive RAM (ReRAM)-based caches that is verified against a memristor circuit-level model. The simulation is driven by power traces based on cache accesses for detailed temperature modeling over time. HotReRAM models power at a fine-grain level and generates temperature traces for different cache regions together with detailed analyses of thermal stability, retention time and write latency. Combining HotReRAM with gem5, a full-system simulator, and NVSim, a power simulator, for ReRAM enables temporal and spatial modeling of crucial ReRAM characteristics. This integration allows designers and architects to analyze various cache characteristics within a single cache bank and address thermal-induced issues when designing ReRAM caches. Our simulation results for an 8-MiB ReRAM cache show that the spatial thermal variance can be as high as 7 K for a single cache bank, whereas the temporal thermal variance is more than 40 K. Such temperature variances impact retention time with a standard deviation of 3.9–10.2 for a set of benchmark applications, where the write latency can increase by up to 14.5%.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 9","pages":"3363-3368"},"PeriodicalIF":2.9,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144887694","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

SiHGNN: Leveraging Properties of Semantic Graphs for Efficient HGNN Acceleration 利用语义图的属性实现高效的HGNN加速

IF 2.9 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-02-27 DOI: 10.1109/TCAD.2025.3546881

Runzhen Xue;Mingyu Yan;Dengke Han;Ziheng Xiao;Zhimin Tang;Xiaochun Ye;Dongrui Fan

{"title":"SiHGNN: Leveraging Properties of Semantic Graphs for Efficient HGNN Acceleration","authors":"Runzhen Xue;Mingyu Yan;Dengke Han;Ziheng Xiao;Zhimin Tang;Xiaochun Ye;Dongrui Fan","doi":"10.1109/TCAD.2025.3546881","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3546881","url":null,"abstract":"Heterogeneous graph neural networks (HGNNs) have expanded graph representation learning to heterogeneous graph fields. Recent studies have demonstrated their superior performance across various applications, including circuit representation, chip design automation, and placement optimization, often surpassing existing methods. However, GPUs often experience inefficiencies when executing HGNNs due to their unique and complex execution patterns. Compared to traditional graph neural networks (GNNs), these patterns further exacerbate irregularities in memory access. To tackle these challenges, recent studies have focused on developing domain-specific accelerators for HGNNs. Nonetheless, most of these efforts have concentrated on optimizing the datapath or scheduling data accesses, while largely overlooking the potential benefits that could be gained from leveraging the inherent properties of the semantic graph, such as its topology, layout, and generation. In this work, we focus on leveraging the properties of semantic graphs to enhance HGNN performance. First, we analyze the semantic graph build (SGB) stage and identify significant opportunities for data reuse during semantic graph generation. Next, we uncover the phenomenon of buffer thrashing during the graph feature processing (GFP) stage, revealing potential optimization opportunities in semantic graph layout. Furthermore, we propose a lightweight hardware accelerator frontend for HGNNs, called SiHGNN. This accelerator frontend incorporates a tree-based SGB for efficient semantic graph generation and features a novel Graph Restructurer for optimizing semantic graph layouts. Experimental results show that SiHGNN enables the state-of-the-art HGNN accelerator to achieve an average performance improvement of <inline-formula> <tex-math>$2.95times $ </tex-math></inline-formula>.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 9","pages":"3490-3503"},"PeriodicalIF":2.9,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144887549","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Efficient Cartesian Genetic Programming-Based Automatic Synthesis Framework for Reversible Quantum-Flux-Parametron Logic Circuits 基于高效笛卡尔遗传规划的可逆量子通量参数逻辑电路自动综合框架

IF 2.9 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-02-27 DOI: 10.1109/TCAD.2025.3546884

Rongliang Fu;Robert Wille;Nobuyuki Yoshikawa;Tsung-Yi Ho

{"title":"Efficient Cartesian Genetic Programming-Based Automatic Synthesis Framework for Reversible Quantum-Flux-Parametron Logic Circuits","authors":"Rongliang Fu;Robert Wille;Nobuyuki Yoshikawa;Tsung-Yi Ho","doi":"10.1109/TCAD.2025.3546884","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3546884","url":null,"abstract":"Reversible computing has garnered significant attention as a promising avenue for achieving energy-efficient computing systems, particularly within the realm of quantum computing. The reversible quantum-flux-parametron (RQFP) is the first practical reversible logic gate utilizing adiabatic superconducting devices, with experimental evidence supporting both its logical and physical reversibility. Each RQFP logic gate operates on alternating current (AC) power and features three input ports and three output ports. Notably, each output port is capable of implementing a majority function while driving only a single fan-out. Additionally, the three inputs to each gate must arrive in the same clock phase. These inherent characteristics present substantial challenges in the design of RQFP logic circuits. To address these challenges, this article proposes an automatic synthesis framework for RQFP logic circuit design based on efficient Cartesian genetic programming (CGP). The framework aims to minimize both the number of RQFP logic gates and the number of garbage outputs within the generated RQFP logic circuit. It incorporates the specific characteristics of the RQFP logic circuit by encoding them into the genotype of a CGP individual. It also introduces several point mutation operations to facilitate the generation of new individuals. Furthermore, the framework integrates circuit simulation with formal verification to assess the functional equivalence between the parent and its offspring. Experimental results on RevLib and reversible reciprocal circuit benchmarks demonstrate the effectiveness of our framework.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 9","pages":"3369-3380"},"PeriodicalIF":2.9,"publicationDate":"2025-02-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144887548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Study of 3-D Line Edge Roughness (LER) in Vertical Channel Array Transistor for DRAM DRAM垂直通道阵列晶体管的三维线边缘粗糙度研究

IF 2.9 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-02-25 DOI: 10.1109/TCAD.2025.3546195

Jaehyuk Lim;Seokchan Yoon;Juho Sung;Sanghyun Kang;Gwon Kim;Hyoung Won Baac;Changhwan Shin

{"title":"Study of 3-D Line Edge Roughness (LER) in Vertical Channel Array Transistor for DRAM","authors":"Jaehyuk Lim;Seokchan Yoon;Juho Sung;Sanghyun Kang;Gwon Kim;Hyoung Won Baac;Changhwan Shin","doi":"10.1109/TCAD.2025.3546195","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3546195","url":null,"abstract":"Line edge roughness (LER) is an undesirable phenomenon that arises during semiconductor fabrication processes, causing fluctuations in the characteristics of semiconductor devices and potentially leading to significant yield degradation. Consequently, LER must be meticulously considered before fabricating integrated circuits. In this study, we present an approach for implementing and analyzing LER in vertical channel array transistors (VCATs) with a gate-all-around (GAA) structure for dynamic random access memory applications. Initially, we propose a method for reliably implementing LER in GAA semiconductor devices. Next, we extend the method to more complex structures beyond the basic cylindrical GAA structure. Utilizing the proposed method, we investigate the impact of LER on various VCAT device configurations by examining DC performance metrics such as IOFF, IDS,LIN, IDS,SAT, VT,LIN, VT,SAT, IOV,LIN, and IOV,SAT. Additionally, we explore AC performance metrics (THOLD, TREAD, and TWRITE) through mixed-mode simulations. The results show that the parameters influencing LER-induced fluctuations in VCATs vary depending on the transistor’s operating region (i.e., whether the transistor is turned on or not).","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 9","pages":"3571-3580"},"PeriodicalIF":2.9,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144887653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

NV-APP: Invalid Programming Performance Improved No-Verify and Adaptive Pulse Programming Scheme for 3-D QLC nand Flash NV-APP：三维QLC闪存的无效编程性能改进无验证和自适应脉冲编程方案

IF 2.9 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-02-25 DOI: 10.1109/TCAD.2025.3545362

Qianqi Zhao;Jing He;Tong Qu;Wentian Wu;Qianhui Li;Qi Wang;Zongliang Huo;Tianchun Ye

{"title":"NV-APP: Invalid Programming Performance Improved No-Verify and Adaptive Pulse Programming Scheme for 3-D QLC nand Flash","authors":"Qianqi Zhao;Jing He;Tong Qu;Wentian Wu;Qianhui Li;Qi Wang;Zongliang Huo;Tianchun Ye","doi":"10.1109/TCAD.2025.3545362","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3545362","url":null,"abstract":"Quad-level cell (QLC) has received significant attention recently due to its extremely high storage capacity. However, because of its poor reliability, QLC-based solid-state drives (SSDs) require a two-step programming to reduce the layer interference. But during the interval between two programming steps on the same wordline (WL), data could be invalidated from update operations, leading to invalid programming and degraded performance. To mitigate the performance loss, we propose the NV-APP scheme to minimize the program and verify pulses during the second-step programming. NV-APP integrates the no-verify (NV) scheme and the adaptive pulse programming scheme (APP). The NV scheme omits verify pulses of invalid verify voltages. The APP scheme adaptively increases the programming step voltage <inline-formula> <tex-math>$(V_{mathrm { step}})$ </tex-math></inline-formula> to accelerate cells’ threshold voltage shift, reducing the number of both program and verify pulses. Device-level simulation results show that the NV-APP scheme reduces the total number of program pulses by an average of 27.03% and verify pulses by an average of 48.70% across various invalid cases during the second-step programming. Based on a modified 3-D QLC SSD simulator with typical traces, the experiments demonstrate that our scheme reduces two-step programming time by an average of 17% on partially invalid WLs, close to the 19.8% reduction achieved by the ideal scheme with no performance loss.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 9","pages":"3592-3605"},"PeriodicalIF":2.9,"publicationDate":"2025-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144887802","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DBB-ECC: Random Double Bit and Burst Error Correction Code for HBM3 DBB-ECC: HBM3的随机双比特和突发纠错码

IF 2.7 3区计算机科学

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems Pub Date : 2025-02-21 DOI: 10.1109/TCAD.2025.3544964

Chaehyeon Shin;Jongsun Park

{"title":"DBB-ECC: Random Double Bit and Burst Error Correction Code for HBM3","authors":"Chaehyeon Shin;Jongsun Park","doi":"10.1109/TCAD.2025.3544964","DOIUrl":"https://doi.org/10.1109/TCAD.2025.3544964","url":null,"abstract":"As dynamic random access memory (DRAM) technology continues to scale down, DRAM vendors have adopted on-die error correction codes (on-die ECC) to address reliability problems caused by cell failures. For burst error correction, a single symbol correction (SSC) Reed-Solomon (RS) code is utilized in high bandwidth memory (HBM) 3. However, randomly scattered errors frequently occur with aggressive technology scaling, which necessitates more robust error correction codes (ECC) scheme that addresses both burst errors and scattered errors. This brief presents double bit and burst ECC (DBB-ECC), an efficient scheme designed to correct both single symbol errors and random double bit errors with reduced implementation overhead. In the proposed decoding, syndromes based on SSC RS codes are used to address both error types without increasing parity bits. The decoder complexity has been also reduced by exploiting the syndrome patterns of double bit errors. The experimental results show that the proposed solution needs lower implementation overhead than conventional ones while maintaining same level of correction capability. Compared to the conventional SSC code, it also significantly enhances HBM3 reliability without increasing storage overhead.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"3236-3240"},"PeriodicalIF":2.7,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144663728","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0