FLOWER and FaME: A Low Overhead Bit-Level Fault-map and Fault-Tolerance Approach for Deeply Scaled Memories

Donald Kline, Jiangwei Zhang, R. Melhem, A. Jones
{"title":"FLOWER and FaME: A Low Overhead Bit-Level Fault-map and Fault-Tolerance Approach for Deeply Scaled Memories","authors":"Donald Kline, Jiangwei Zhang, R. Melhem, A. Jones","doi":"10.1109/HPCA47549.2020.00037","DOIUrl":null,"url":null,"abstract":"To maintain appropriate yields in deeply scaled technologies requires fault-tolerance of increasingly high fault rates. These fault rates far exceed traditional general approaches such as ECC, particularly when faults accrue over time. Effective fault tolerance at such high fault rates requires detailed bit-level knowledge of the location of faulty cells. We provide a solution to this problem in the form of a space efficient, bit-level fault map called FLOWER. FLOWER utilizes Bloom filters to provide detailed fault characterization for a relatively small overhead. We demonstrate how FLOWER can enable improved fault tolerance at high fault rates by enhancing existing fault tolerance proposals and yielding 10–100x improvements. Using in-memory processing, FLOWER can maintain a less than 2% performance overhead at 10E-4 fault rates with less than 2% loss of memory density to report bit-level faults with high accuracy. Using a tuned novel hashing technique called MinCI, FLOWER for memory achieves considerably lower false positives than with disk-level hashing techniques at a fraction of the performance overhead. With a new technique to protect against errors during in-memory operations, PETAL bits, FLOWER can remain resilient against random errors while efficiently targeting predictable errors. Furthermore, we propose a new fault tolerance scheme called FaME, which provides ultra-efficient bit-level sparing by using the FLOWER fault map to identify the location of faults. FLOWER+FaME can achieve 14x longer PCM memory lifetime with half the area overhead versus SECDED ECC.","PeriodicalId":339648,"journal":{"name":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA47549.2020.00037","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

To maintain appropriate yields in deeply scaled technologies requires fault-tolerance of increasingly high fault rates. These fault rates far exceed traditional general approaches such as ECC, particularly when faults accrue over time. Effective fault tolerance at such high fault rates requires detailed bit-level knowledge of the location of faulty cells. We provide a solution to this problem in the form of a space efficient, bit-level fault map called FLOWER. FLOWER utilizes Bloom filters to provide detailed fault characterization for a relatively small overhead. We demonstrate how FLOWER can enable improved fault tolerance at high fault rates by enhancing existing fault tolerance proposals and yielding 10–100x improvements. Using in-memory processing, FLOWER can maintain a less than 2% performance overhead at 10E-4 fault rates with less than 2% loss of memory density to report bit-level faults with high accuracy. Using a tuned novel hashing technique called MinCI, FLOWER for memory achieves considerably lower false positives than with disk-level hashing techniques at a fraction of the performance overhead. With a new technique to protect against errors during in-memory operations, PETAL bits, FLOWER can remain resilient against random errors while efficiently targeting predictable errors. Furthermore, we propose a new fault tolerance scheme called FaME, which provides ultra-efficient bit-level sparing by using the FLOWER fault map to identify the location of faults. FLOWER+FaME can achieve 14x longer PCM memory lifetime with half the area overhead versus SECDED ECC.
FLOWER和FaME:深度缩放存储器的低开销位级错误映射和容错方法
为了在深度规模化技术中保持适当的产量,需要对越来越高的故障率进行容错。这些故障率远远超过传统的通用方法,如ECC,特别是当故障随着时间的推移而累积时。在如此高的故障率下,有效的容错要求对故障单元的位置有详细的位级知识。我们以一种称为FLOWER的空间高效位级故障映射的形式提供了这个问题的解决方案。FLOWER利用Bloom过滤器以相对较小的开销提供详细的故障表征。我们演示了FLOWER如何通过增强现有的容错建议来提高高故障率下的容错能力,并产生10 - 100倍的改进。使用内存处理,FLOWER可以在10E-4故障率下保持不到2%的性能开销,并且内存密度损失小于2%,从而高精度地报告位级故障。使用一种被称为MinCI的经过调优的新型哈希技术,与使用磁盘级哈希技术相比,FLOWER在性能开销很小的情况下实现了相当低的误报。使用一种新的技术来防止内存操作期间的错误,即花瓣位,FLOWER可以在有效地针对可预测错误的同时保持对随机错误的弹性。此外,我们提出了一种新的容错方案FaME,该方案利用FLOWER故障图来识别故障的位置,从而提供了超高效的位级保留。FLOWER+FaME可以实现比SECDED ECC长14倍的PCM内存寿命,而面积开销仅为前者的一半。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信