ROP:通过在冻结周期中恢复内存系统来减轻刷新开销

2016 45th International Conference on Parallel Processing (ICPP) Pub Date : 2016-08-01 DOI:10.1109/ICPP.2016.26

Ping-Hsiu Huang, Wenjie Liu, Kun Tang, Xubin He, Ke Zhou

{"title":"ROP:通过在冻结周期中恢复内存系统来减轻刷新开销","authors":"Ping-Hsiu Huang, Wenjie Liu, Kun Tang, Xubin He, Ke Zhou","doi":"10.1109/ICPP.2016.26","DOIUrl":null,"url":null,"abstract":"DRAM memory performs periodic refreshes to prevent data loss due to charge leakage, while memory refreshes cause performance degradation and energy consumption, referred to as refresh overheads. In this paper, we propose Refresh-Oriented Prefetching (ROP) to alleviate memory refresh overheads. Before a refresh starts, ROP prefetches cache lines from the tobe-refreshed rank into an added SRAM buffer. In doing so, when a rank is undergoing refresh, memory requests can still be serviced rather than being blocked. At the core of ROP is a probabilistic prefetch model determining which cache lines are prefetched for a refresh based on the access patterns appearing in an observational window ahead of the refresh. A Pattern Profiler collects statistics about memory traffic occurring before and after the starting time of each refresh operation in a period of training time and it outputs two conditional probabilities which are used to control subsequent prefetch decisions. A Prefetcher maintains a prediction table which helps to ascertain access patterns appearing around refresh operations. The prediction table is updated every time an access occurs to the to-be-nextrefreshed ran during the observational window and is consulted to decide which cache lines are prefetched. Extensive evaluation results with benchmarks from SPEC CPU2006 on a DDR4 memory have demonstrated that with ROP memory performance can be improved by up to 9.2% (3.3% on average) for singlecore simulations, while reducing the overall memory energy by up to 6.7% (3.6% on average), relative to an auto-refresh baseline memory. Moreover, it increases the Weighted Speedup by up to 2.22X (1.32X on average) for 4-core multiprogram simulations, while reducing energy by up to 48.8% (24.4% on average).","PeriodicalId":409991,"journal":{"name":"2016 45th International Conference on Parallel Processing (ICPP)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"ROP: Alleviating Refresh Overheads via Reviving the Memory System in Frozen Cycles\",\"authors\":\"Ping-Hsiu Huang, Wenjie Liu, Kun Tang, Xubin He, Ke Zhou\",\"doi\":\"10.1109/ICPP.2016.26\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"DRAM memory performs periodic refreshes to prevent data loss due to charge leakage, while memory refreshes cause performance degradation and energy consumption, referred to as refresh overheads. In this paper, we propose Refresh-Oriented Prefetching (ROP) to alleviate memory refresh overheads. Before a refresh starts, ROP prefetches cache lines from the tobe-refreshed rank into an added SRAM buffer. In doing so, when a rank is undergoing refresh, memory requests can still be serviced rather than being blocked. At the core of ROP is a probabilistic prefetch model determining which cache lines are prefetched for a refresh based on the access patterns appearing in an observational window ahead of the refresh. A Pattern Profiler collects statistics about memory traffic occurring before and after the starting time of each refresh operation in a period of training time and it outputs two conditional probabilities which are used to control subsequent prefetch decisions. A Prefetcher maintains a prediction table which helps to ascertain access patterns appearing around refresh operations. The prediction table is updated every time an access occurs to the to-be-nextrefreshed ran during the observational window and is consulted to decide which cache lines are prefetched. Extensive evaluation results with benchmarks from SPEC CPU2006 on a DDR4 memory have demonstrated that with ROP memory performance can be improved by up to 9.2% (3.3% on average) for singlecore simulations, while reducing the overall memory energy by up to 6.7% (3.6% on average), relative to an auto-refresh baseline memory. Moreover, it increases the Weighted Speedup by up to 2.22X (1.32X on average) for 4-core multiprogram simulations, while reducing energy by up to 48.8% (24.4% on average).\",\"PeriodicalId\":409991,\"journal\":{\"name\":\"2016 45th International Conference on Parallel Processing (ICPP)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-08-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 45th International Conference on Parallel Processing (ICPP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPP.2016.26\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 45th International Conference on Parallel Processing (ICPP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2016.26","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

DRAM内存执行周期性刷新，以防止由于电荷泄漏而导致数据丢失，而内存刷新会导致性能下降和能量消耗，称为刷新开销。在本文中，我们提出了面向刷新的预取(ROP)来减少内存刷新开销。在刷新开始之前，ROP将缓存行从待刷新的秩预取到一个添加的SRAM缓冲区中。这样，当rank正在刷新时，内存请求仍然可以得到服务，而不是被阻塞。ROP的核心是一个概率预取模型，它根据刷新前观察窗口中出现的访问模式来确定刷新时要预取哪些缓存行。Pattern Profiler在一段训练时间内收集每次刷新操作开始时间前后发生的内存流量统计信息，并输出两个条件概率，用于控制后续的预取决策。预取器维护一个预测表，它有助于确定刷新操作周围出现的访问模式。在观察窗口期间，每次对要刷新的运行进行访问时，都会更新预测表，并参考预测表来决定预取哪些缓存行。基于DDR4内存的speccpu2006基准测试的广泛评估结果表明，与自动刷新基准内存相比，在单核模拟中，使用ROP内存性能可以提高9.2%(平均3.3%)，同时减少总体内存能量高达6.7%(平均3.6%)。此外，对于4核多程序模拟，它将加权加速提高了高达2.22X(平均1.32X)，同时减少了高达48.8%(平均24.4%)的能量。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

ROP: Alleviating Refresh Overheads via Reviving the Memory System in Frozen Cycles

DRAM memory performs periodic refreshes to prevent data loss due to charge leakage, while memory refreshes cause performance degradation and energy consumption, referred to as refresh overheads. In this paper, we propose Refresh-Oriented Prefetching (ROP) to alleviate memory refresh overheads. Before a refresh starts, ROP prefetches cache lines from the tobe-refreshed rank into an added SRAM buffer. In doing so, when a rank is undergoing refresh, memory requests can still be serviced rather than being blocked. At the core of ROP is a probabilistic prefetch model determining which cache lines are prefetched for a refresh based on the access patterns appearing in an observational window ahead of the refresh. A Pattern Profiler collects statistics about memory traffic occurring before and after the starting time of each refresh operation in a period of training time and it outputs two conditional probabilities which are used to control subsequent prefetch decisions. A Prefetcher maintains a prediction table which helps to ascertain access patterns appearing around refresh operations. The prediction table is updated every time an access occurs to the to-be-nextrefreshed ran during the observational window and is consulted to decide which cache lines are prefetched. Extensive evaluation results with benchmarks from SPEC CPU2006 on a DDR4 memory have demonstrated that with ROP memory performance can be improved by up to 9.2% (3.3% on average) for singlecore simulations, while reducing the overall memory energy by up to 6.7% (3.6% on average), relative to an auto-refresh baseline memory. Moreover, it increases the Weighted Speedup by up to 2.22X (1.32X on average) for 4-core multiprogram simulations, while reducing energy by up to 48.8% (24.4% on average).

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 45th International Conference on Parallel Processing (ICPP)

自引率

0.00%

发文量