Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems

Janani Mukundan, H. Hunter, Kyu-hyoun Kim, Jeffrey Stuecheli, José F. Martínez
{"title":"Understanding and mitigating refresh overheads in high-density DDR4 DRAM systems","authors":"Janani Mukundan, H. Hunter, Kyu-hyoun Kim, Jeffrey Stuecheli, José F. Martínez","doi":"10.1145/2485922.2485927","DOIUrl":null,"url":null,"abstract":"Recent DRAM specifications exhibit increasing refresh latencies. A refresh command blocks a full rank, decreasing available parallelism in the memory subsystem significantly, thus decreasing performance. Fine Granularity Refresh (FGR) is a feature recently announced as part of JEDEC's DDR4 DRAM specification that attempts to tackle this problem by creating a range of refresh options that provide a trade-off between refresh latency and frequency. In this paper, we first conduct an analysis of DDR4 DRAM's FGR feature, and show that there is no one-size-fits-all option across a variety of applications. We then present Adaptive Refresh (AR), a simple yet effective mechanism that dynamically chooses the best FGR mode for each application and phase within the application. When looking at the refresh problem more closely, we identify in high-density DRAM systems a phenomenon that we call command queue seizure, whereby the memory controller's command queue seizes up temporarily because it is full with commands to a rank that is being refreshed. To attack this problem, we propose two complementary mechanisms called Delayed Command Expansion (DCE) and Preemptive Command Drain (PCD). Our results show that AR does exploit DDR4's FGR effectively. However, once our proposed DCE and PCD mechanisms are added, DDR4's FGR becomes redundant in most cases, except in a few highly memory-sensitive applications, where the use of AR does provide some additional benefit. In all, our simulations show that the proposed mechanisms yield 8% (14%) mean speedup with respect to traditional refresh, at normal (extended) DRAM operating temperatures, for a set of diverse parallel applications.","PeriodicalId":20555,"journal":{"name":"Proceedings of the 40th Annual International Symposium on Computer Architecture","volume":"24 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2013-06-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"103","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th Annual International Symposium on Computer Architecture","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2485922.2485927","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 103

Abstract

Recent DRAM specifications exhibit increasing refresh latencies. A refresh command blocks a full rank, decreasing available parallelism in the memory subsystem significantly, thus decreasing performance. Fine Granularity Refresh (FGR) is a feature recently announced as part of JEDEC's DDR4 DRAM specification that attempts to tackle this problem by creating a range of refresh options that provide a trade-off between refresh latency and frequency. In this paper, we first conduct an analysis of DDR4 DRAM's FGR feature, and show that there is no one-size-fits-all option across a variety of applications. We then present Adaptive Refresh (AR), a simple yet effective mechanism that dynamically chooses the best FGR mode for each application and phase within the application. When looking at the refresh problem more closely, we identify in high-density DRAM systems a phenomenon that we call command queue seizure, whereby the memory controller's command queue seizes up temporarily because it is full with commands to a rank that is being refreshed. To attack this problem, we propose two complementary mechanisms called Delayed Command Expansion (DCE) and Preemptive Command Drain (PCD). Our results show that AR does exploit DDR4's FGR effectively. However, once our proposed DCE and PCD mechanisms are added, DDR4's FGR becomes redundant in most cases, except in a few highly memory-sensitive applications, where the use of AR does provide some additional benefit. In all, our simulations show that the proposed mechanisms yield 8% (14%) mean speedup with respect to traditional refresh, at normal (extended) DRAM operating temperatures, for a set of diverse parallel applications.
理解和减少高密度DDR4 DRAM系统中的刷新开销
最近的DRAM规范显示刷新延迟增加。刷新命令阻塞了一个满秩,显著降低了内存子系统中的可用并行性,从而降低了性能。细粒度刷新(Fine Granularity Refresh, FGR)是JEDEC的DDR4 DRAM规范中最近宣布的一项功能,它试图通过创建一系列刷新选项来解决这个问题,这些选项提供了刷新延迟和频率之间的权衡。在本文中,我们首先对DDR4 DRAM的FGR特性进行了分析,并表明在各种应用中没有放之四海而皆准的选择。然后,我们介绍了自适应刷新(AR),这是一种简单而有效的机制,可以动态地为每个应用程序和应用程序中的每个阶段选择最佳的FGR模式。当更仔细地观察刷新问题时,我们在高密度DRAM系统中发现了一种我们称之为命令队列扣押的现象,即内存控制器的命令队列暂时扣押,因为它充满了要刷新的等级的命令。为了解决这个问题,我们提出了两种互补的机制,即延迟命令扩展(DCE)和抢先命令耗尽(PCD)。我们的研究结果表明,AR确实有效地利用了DDR4的FGR。然而,一旦我们提出的DCE和PCD机制被加入,DDR4的FGR在大多数情况下变得多余,除了在一些对内存高度敏感的应用程序中,使用AR确实提供了一些额外的好处。总之,我们的模拟表明,在正常(扩展)DRAM工作温度下,对于一组不同的并行应用程序,所提出的机制相对于传统刷新产生8%(14%)的平均加速。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信