DRAM Characterization under Relaxed Refresh Period Considering System Level Effects within a Commodity Server

L. Mukhanov, Konstantinos Tovletoglou, Dimitrios S. Nikolopoulos, G. Karakonstantis
{"title":"DRAM Characterization under Relaxed Refresh Period Considering System Level Effects within a Commodity Server","authors":"L. Mukhanov, Konstantinos Tovletoglou, Dimitrios S. Nikolopoulos, G. Karakonstantis","doi":"10.1109/IOLTS.2018.8474184","DOIUrl":null,"url":null,"abstract":"Today’s rapid generation of data and the increased need for higher memory capacity has triggered a lot of studies on aggressive scaling of refresh period, which is currently set according to rare worst case conditions. Such studies analysed in detail the data-dependent circuit level factors and indicated the need for online DRAM characterization due to the variable cell retention time. They have done so by executing few test data patterns on FPGAs under controlled temperatures by using thermal testbeds, which however cannot be available in the field. Moreover, the existing studies were not able to reveal any system level effects, which may be excited under the execution of workloads on real systems and directly or indirectly affect DRAM reliability. In this paper, we develop an experimental framework based on a state-of-the-art 64-bit ARM based server with Linux OS, in which we enabled the DRAM characterization under relaxed refresh period by executing conventional test data patterns as well as popular HPC and Cloud workloads. Our results indicate that common test patterns are ineffective in identifying error-prone locations at low DRAM temperatures. Furthermore, we reveal that there is a strong correlation between the SOC utilization and DRAM reliability. By exploiting such findings, we developed a benchmark, which can indirectly stress the DRAM temperature and thus used for characterization in the field without needing any complicated thermal equipment. Our study shows that the refresh period can be relaxed by 35 times on such a commodity system with all errors being corrected by the available error correcting codes, resulting in 11.5% power savings on average.","PeriodicalId":241735,"journal":{"name":"2018 IEEE 24th International Symposium on On-Line Testing And Robust System Design (IOLTS)","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 IEEE 24th International Symposium on On-Line Testing And Robust System Design (IOLTS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IOLTS.2018.8474184","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Today’s rapid generation of data and the increased need for higher memory capacity has triggered a lot of studies on aggressive scaling of refresh period, which is currently set according to rare worst case conditions. Such studies analysed in detail the data-dependent circuit level factors and indicated the need for online DRAM characterization due to the variable cell retention time. They have done so by executing few test data patterns on FPGAs under controlled temperatures by using thermal testbeds, which however cannot be available in the field. Moreover, the existing studies were not able to reveal any system level effects, which may be excited under the execution of workloads on real systems and directly or indirectly affect DRAM reliability. In this paper, we develop an experimental framework based on a state-of-the-art 64-bit ARM based server with Linux OS, in which we enabled the DRAM characterization under relaxed refresh period by executing conventional test data patterns as well as popular HPC and Cloud workloads. Our results indicate that common test patterns are ineffective in identifying error-prone locations at low DRAM temperatures. Furthermore, we reveal that there is a strong correlation between the SOC utilization and DRAM reliability. By exploiting such findings, we developed a benchmark, which can indirectly stress the DRAM temperature and thus used for characterization in the field without needing any complicated thermal equipment. Our study shows that the refresh period can be relaxed by 35 times on such a commodity system with all errors being corrected by the available error correcting codes, resulting in 11.5% power savings on average.
考虑商品服务器内系统级效应的放松刷新周期下的DRAM特性
如今数据的快速生成和对更高内存容量的需求的增加引发了大量关于刷新周期的积极缩放的研究,刷新周期目前是根据罕见的最坏情况设置的。这些研究详细分析了与数据相关的电路电平因素,并指出由于可变的电池保留时间,需要在线表征DRAM。他们通过使用热试验台在控制温度下对fpga执行少量测试数据模式来实现这一目标,然而,热试验台在现场是不可用的。此外,现有的研究无法揭示任何系统级影响,这些影响可能在实际系统上执行工作负载时被激发,并直接或间接地影响DRAM的可靠性。在本文中,我们开发了一个基于最先进的64位ARM服务器和Linux操作系统的实验框架,其中我们通过执行传统的测试数据模式以及流行的HPC和云工作负载,在宽松的刷新周期下实现DRAM特性。我们的研究结果表明,在低DRAM温度下,常见的测试模式在识别容易出错的位置方面是无效的。此外,我们发现SOC利用率与DRAM可靠性之间存在很强的相关性。通过利用这些发现,我们开发了一个基准,可以间接地强调DRAM温度,从而在不需要任何复杂的热设备的情况下用于现场表征。我们的研究表明,在这种商品系统上,刷新周期可以放宽35次,所有错误都由可用的纠错码纠正,平均节省11.5%的电力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信