ArchShield: architectural framework for assisting DRAM scaling by tolerating high error rates

Prashant J. Nair, Dae-Hyun Kim, Moinuddin K. Qureshi
{"title":"ArchShield: architectural framework for assisting DRAM scaling by tolerating high error rates","authors":"Prashant J. Nair, Dae-Hyun Kim, Moinuddin K. Qureshi","doi":"10.1145/2485922.2485929","DOIUrl":null,"url":null,"abstract":"DRAM scaling has been the prime driver for increasing the capacity of main memory system over the past three decades. Unfortunately, scaling DRAM to smaller technology nodes has become challenging due to the inherent difficulty in designing smaller geometries, coupled with the problems of device variation and leakage. Future DRAM devices are likely to experience significantly high error-rates. Techniques that can tolerate errors efficiently can enable DRAM to scale to smaller technology nodes. However, existing techniques such as row/column sparing and ECC become prohibitive at high error-rates. To develop cost-effective solutions for tolerating high error-rates, this paper advocates a cross-layer approach. Rather than hiding the faulty cell information within the DRAM chips, we expose it to the architectural level. We propose ArchShield, an architectural framework that employs runtime testing to identify faulty DRAM cells. ArchShield tolerates these faults using two components, a Fault Map that keeps information about faulty words in a cache line, and Selective Word-Level Replication (SWLR) that replicates faulty words for error resilience. Both Fault Map and SWLR are integrated in reserved area in DRAM memory. Our evaluations with 8GB DRAM DIMM show that ArchShield can efficiently tolerate error-rates as higher as 10−4 (100x higher than ECC alone), causes less than 2% performance degradation, and still maintains 1-bit error tolerance against soft errors.","PeriodicalId":20555,"journal":{"name":"Proceedings of the 40th Annual International Symposium on Computer Architecture","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"170","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2485922.2485929","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 170

Abstract

DRAM scaling has been the prime driver for increasing the capacity of main memory system over the past three decades. Unfortunately, scaling DRAM to smaller technology nodes has become challenging due to the inherent difficulty in designing smaller geometries, coupled with the problems of device variation and leakage. Future DRAM devices are likely to experience significantly high error-rates. Techniques that can tolerate errors efficiently can enable DRAM to scale to smaller technology nodes. However, existing techniques such as row/column sparing and ECC become prohibitive at high error-rates. To develop cost-effective solutions for tolerating high error-rates, this paper advocates a cross-layer approach. Rather than hiding the faulty cell information within the DRAM chips, we expose it to the architectural level. We propose ArchShield, an architectural framework that employs runtime testing to identify faulty DRAM cells. ArchShield tolerates these faults using two components, a Fault Map that keeps information about faulty words in a cache line, and Selective Word-Level Replication (SWLR) that replicates faulty words for error resilience. Both Fault Map and SWLR are integrated in reserved area in DRAM memory. Our evaluations with 8GB DRAM DIMM show that ArchShield can efficiently tolerate error-rates as higher as 10−4 (100x higher than ECC alone), causes less than 2% performance degradation, and still maintains 1-bit error tolerance against soft errors.
ArchShield:通过容忍高错误率来协助DRAM扩展的架构框架
在过去的三十年里,DRAM的扩展一直是增加主存储系统容量的主要驱动力。不幸的是,由于设计更小的几何形状固有的困难,加上器件变化和泄漏问题,将DRAM扩展到更小的技术节点已经变得具有挑战性。未来的DRAM设备可能会经历非常高的错误率。能够有效容错的技术可以使DRAM扩展到更小的技术节点。然而,现有的技术,如行/列节省和ECC,在高错误率时变得令人望而却步。为了开发具有成本效益的解决方案来容忍高错误率,本文提倡采用跨层方法。我们不是将错误单元信息隐藏在DRAM芯片中,而是将其暴露在体系结构级别。我们提出ArchShield,一个架构框架,采用运行时测试来识别故障的DRAM单元。ArchShield使用两个组件来容忍这些错误,一个是将错误单词的信息保存在高速缓存线上的故障映射,另一个是复制错误单词以实现错误恢复的选择性单词级复制(SWLR)。Fault Map和SWLR都集成在DRAM内存的预留区域中。我们对8GB DRAM DIMM的评估表明,ArchShield可以有效地容忍错误率高达10 - 4(比单独的ECC高100倍),导致不到2%的性能下降,并且仍然保持对软错误的1位容错。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信