{"title":"DBB-ECC: Random Double Bit and Burst Error Correction Code for HBM3","authors":"Chaehyeon Shin;Jongsun Park","doi":"10.1109/TCAD.2025.3544964","DOIUrl":null,"url":null,"abstract":"As dynamic random access memory (DRAM) technology continues to scale down, DRAM vendors have adopted on-die error correction codes (on-die ECC) to address reliability problems caused by cell failures. For burst error correction, a single symbol correction (SSC) Reed-Solomon (RS) code is utilized in high bandwidth memory (HBM) 3. However, randomly scattered errors frequently occur with aggressive technology scaling, which necessitates more robust error correction codes (ECC) scheme that addresses both burst errors and scattered errors. This brief presents double bit and burst ECC (DBB-ECC), an efficient scheme designed to correct both single symbol errors and random double bit errors with reduced implementation overhead. In the proposed decoding, syndromes based on SSC RS codes are used to address both error types without increasing parity bits. The decoder complexity has been also reduced by exploiting the syndrome patterns of double bit errors. The experimental results show that the proposed solution needs lower implementation overhead than conventional ones while maintaining same level of correction capability. Compared to the conventional SSC code, it also significantly enhances HBM3 reliability without increasing storage overhead.","PeriodicalId":13251,"journal":{"name":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","volume":"44 8","pages":"3236-3240"},"PeriodicalIF":2.7000,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10899823/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, HARDWARE & ARCHITECTURE","Score":null,"Total":0}
引用次数: 0
Abstract
As dynamic random access memory (DRAM) technology continues to scale down, DRAM vendors have adopted on-die error correction codes (on-die ECC) to address reliability problems caused by cell failures. For burst error correction, a single symbol correction (SSC) Reed-Solomon (RS) code is utilized in high bandwidth memory (HBM) 3. However, randomly scattered errors frequently occur with aggressive technology scaling, which necessitates more robust error correction codes (ECC) scheme that addresses both burst errors and scattered errors. This brief presents double bit and burst ECC (DBB-ECC), an efficient scheme designed to correct both single symbol errors and random double bit errors with reduced implementation overhead. In the proposed decoding, syndromes based on SSC RS codes are used to address both error types without increasing parity bits. The decoder complexity has been also reduced by exploiting the syndrome patterns of double bit errors. The experimental results show that the proposed solution needs lower implementation overhead than conventional ones while maintaining same level of correction capability. Compared to the conventional SSC code, it also significantly enhances HBM3 reliability without increasing storage overhead.
期刊介绍:
The purpose of this Transactions is to publish papers of interest to individuals in the area of computer-aided design of integrated circuits and systems composed of analog, digital, mixed-signal, optical, or microwave components. The aids include methods, models, algorithms, and man-machine interfaces for system-level, physical and logical design including: planning, synthesis, partitioning, modeling, simulation, layout, verification, testing, hardware-software co-design and documentation of integrated circuit and system designs of all complexities. Design tools and techniques for evaluating and designing integrated circuits and systems for metrics such as performance, power, reliability, testability, and security are a focus.