On the Use of DRAM with Unrepaired Weak Cells in Computing Systems

Hao Wang, Yin Li, Xuebin Zhang, Xiaoqing Zhao, Hongbin Sun, Tong Zhang
{"title":"On the Use of DRAM with Unrepaired Weak Cells in Computing Systems","authors":"Hao Wang, Yin Li, Xuebin Zhang, Xiaoqing Zhao, Hongbin Sun, Tong Zhang","doi":"10.1145/2989081.2989108","DOIUrl":null,"url":null,"abstract":"In current practice, DRAM manufacturers apply redundancy-repair to decommission all the weak cells that cannot satisfy the target data retention time under the worse-case operational conditions (e.g., the highest operating temperature). However, as the DRAM scaling enters sub-20nm regime, it becomes increasingly challenging to repair all the weak cells at reasonable cost. This work studies how one could use DRAM chips with unrepaired weak cells in computing systems. In particular, this work is based upon the simple idea that OS reserves all the error-prone pages, which contain at least one unrepaired weak cell, from being used. Under a relatively high error-prone page rate (e.g., 8%), this basic idea is subject to two issues: (1) Simply reserving all the error-prone pages could make it almost impossible for OS to allocate a continuous fragmentation-free physical memory space for some critical operations such as OS booting and DMA buffering. (2) Since most error-prone pages may only contain few unrepaired weak cells, reserving all the error-prone pages from practical usage could cause noticeable memory resource waste. Aiming to address these issues, this paper presents a controller-based selective page re-mapping strategy to ensure a continuous critical memory region for OS, and develops a software-based memory error tolerance scheme to re-cycle all the error-prone pages for the zRAM function in Linux. Since the first scheme only eliminates the fragmentation in the critical memory region (e.g., 128MB in Linux), the remaining non-critical memory region is still subject to severe fragmentation. Hence, we carried out experiments using SPEC CPU2006 to quantitatively demonstrate that highly fragmented non-critical memory region may not cause significant computing system performance degradation. We further study the latency and hardware cost of implementing the controller-based page re-mapping, and the effectiveness of re-cycling error-prone pages for zRAM in Linux. The experimental results show that our proposed software-based error tolerance scheme degrades the speed performance of zRAM by only up to 7%.","PeriodicalId":283512,"journal":{"name":"Proceedings of the Second International Symposium on Memory Systems","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Second International Symposium on Memory Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2989081.2989108","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In current practice, DRAM manufacturers apply redundancy-repair to decommission all the weak cells that cannot satisfy the target data retention time under the worse-case operational conditions (e.g., the highest operating temperature). However, as the DRAM scaling enters sub-20nm regime, it becomes increasingly challenging to repair all the weak cells at reasonable cost. This work studies how one could use DRAM chips with unrepaired weak cells in computing systems. In particular, this work is based upon the simple idea that OS reserves all the error-prone pages, which contain at least one unrepaired weak cell, from being used. Under a relatively high error-prone page rate (e.g., 8%), this basic idea is subject to two issues: (1) Simply reserving all the error-prone pages could make it almost impossible for OS to allocate a continuous fragmentation-free physical memory space for some critical operations such as OS booting and DMA buffering. (2) Since most error-prone pages may only contain few unrepaired weak cells, reserving all the error-prone pages from practical usage could cause noticeable memory resource waste. Aiming to address these issues, this paper presents a controller-based selective page re-mapping strategy to ensure a continuous critical memory region for OS, and develops a software-based memory error tolerance scheme to re-cycle all the error-prone pages for the zRAM function in Linux. Since the first scheme only eliminates the fragmentation in the critical memory region (e.g., 128MB in Linux), the remaining non-critical memory region is still subject to severe fragmentation. Hence, we carried out experiments using SPEC CPU2006 to quantitatively demonstrate that highly fragmented non-critical memory region may not cause significant computing system performance degradation. We further study the latency and hardware cost of implementing the controller-based page re-mapping, and the effectiveness of re-cycling error-prone pages for zRAM in Linux. The experimental results show that our proposed software-based error tolerance scheme degrades the speed performance of zRAM by only up to 7%.
弱单元未修复的DRAM在计算系统中的应用
在目前的实践中,DRAM制造商采用冗余修复技术,在最坏的操作条件下(例如,最高工作温度),使所有不能满足目标数据保留时间的弱单元退役。然而,随着DRAM尺寸进入20nm以下,以合理的成本修复所有薄弱单元变得越来越具有挑战性。这项工作研究了如何在计算系统中使用带有未修复的弱单元的DRAM芯片。特别是,这项工作基于这样一个简单的想法,即操作系统保留所有容易出错的页面(其中至少包含一个未修复的弱单元),使其不被使用。在相对较高的易出错页面率(例如,8%)下,这个基本想法会受到两个问题的影响:(1)简单地保留所有易出错页面可能会使操作系统几乎不可能为一些关键操作(如操作系统引导和DMA缓冲)分配连续的无碎片物理内存空间。(2)由于大多数易出错页面可能只包含少数未修复的弱单元,因此保留所有易出错页面不被实际使用可能会造成明显的内存资源浪费。针对这些问题,本文提出了一种基于控制器的选择性页面重映射策略,以确保操作系统具有连续的关键内存区域,并开发了一种基于软件的内存容错方案,以回收Linux zRAM功能中所有容易出错的页面。由于第一种方案只消除了关键内存区域(例如Linux中的128MB)中的碎片,因此剩余的非关键内存区域仍然存在严重的碎片。因此,我们使用SPEC CPU2006进行了实验,以定量地证明高度碎片化的非关键内存区域可能不会导致显著的计算系统性能下降。我们进一步研究了实现基于控制器的页面重新映射的延迟和硬件成本,以及在Linux中为zRAM重新循环容易出错的页面的有效性。实验结果表明,我们提出的基于软件的容错方案仅使zRAM的速度性能降低了7%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信