DRAM-Level Prefetching for Fully-Buffered DIMM: Design, Performance and Power Saving

2007 IEEE International Symposium on Performance Analysis of Systems & Software Pub Date : 2007-04-25 DOI:10.1109/ISPASS.2007.363740

Jiang Lin, Hongzhong Zheng, Zhichun Zhu, Zhao Zhang, Howard David

{"title":"DRAM-Level Prefetching for Fully-Buffered DIMM: Design, Performance and Power Saving","authors":"Jiang Lin, Hongzhong Zheng, Zhichun Zhu, Zhao Zhang, Howard David","doi":"10.1109/ISPASS.2007.363740","DOIUrl":null,"url":null,"abstract":"We have studied DRAM-level prefetching for the fully buffered DIMM (FB-DIMM) designed for multi-core processors. FB-DIMM has a unique two-level interconnect structure, with FB-DIMM channels at the first-level connecting the memory controller and advanced memory buffers (AMBs); and DDR2 buses at the second-level connecting the AMBs with DRAM chips. We propose an AMB prefetching method that prefetches memory blocks from DRAM chips to AMBs. It utilizes the redundant bandwidth between the DRAM chips and AMBs but does not consume the crucial channel bandwidth. The proposed method fetches K memory blocks of L2 cache block sizes around the demanded block, where K is a small value ranging from two to eight. The method may also reduce the DRAM power consumption by merging some DRAM precharges and activations. Our cycle-accurate simulation shows that the average performance improvement is 16% for single-core and multi-core workloads constructed from memory-intensive SPEC2000 programs with software cache prefetching enabled; and no workload has negative speedup. We have found that the performance gain comes from the reduction of idle memory latency and the improvement of channel bandwidth utilization. We have also found that there is only a small overlap between the performance gains from the AMB prefetching and the software cache prefetching. The average of estimated power saving is 15%","PeriodicalId":439151,"journal":{"name":"2007 IEEE International Symposium on Performance Analysis of Systems & Software","volume":"108 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2007 IEEE International Symposium on Performance Analysis of Systems & Software","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPASS.2007.363740","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

We have studied DRAM-level prefetching for the fully buffered DIMM (FB-DIMM) designed for multi-core processors. FB-DIMM has a unique two-level interconnect structure, with FB-DIMM channels at the first-level connecting the memory controller and advanced memory buffers (AMBs); and DDR2 buses at the second-level connecting the AMBs with DRAM chips. We propose an AMB prefetching method that prefetches memory blocks from DRAM chips to AMBs. It utilizes the redundant bandwidth between the DRAM chips and AMBs but does not consume the crucial channel bandwidth. The proposed method fetches K memory blocks of L2 cache block sizes around the demanded block, where K is a small value ranging from two to eight. The method may also reduce the DRAM power consumption by merging some DRAM precharges and activations. Our cycle-accurate simulation shows that the average performance improvement is 16% for single-core and multi-core workloads constructed from memory-intensive SPEC2000 programs with software cache prefetching enabled; and no workload has negative speedup. We have found that the performance gain comes from the reduction of idle memory latency and the improvement of channel bandwidth utilization. We have also found that there is only a small overlap between the performance gains from the AMB prefetching and the software cache prefetching. The average of estimated power saving is 15%

查看原文本刊更多论文

全缓冲DIMM的dram级预取:设计、性能与节能

研究了多核处理器全缓冲DIMM (FB-DIMM)的dram级预取。FB-DIMM具有独特的两级互连结构，FB-DIMM通道在第一级连接存储器控制器和高级存储器缓冲区(AMBs);以及连接amb和DRAM芯片的第二级DDR2总线。我们提出了一种AMB预取方法，将内存块从DRAM芯片预取到AMB。它利用DRAM芯片和amb之间的冗余带宽，但不消耗关键通道带宽。所提出的方法获取所需块周围的K个L2缓存块大小的内存块，其中K是一个小值，范围从2到8。该方法还可以通过合并一些DRAM预充和激活来降低DRAM功耗。我们的周期精确模拟表明，对于由内存密集型SPEC2000程序构建的单核和多核工作负载，在启用软件缓存预取的情况下，平均性能提高了16%;没有工作负载有负加速。我们发现，性能的提高来自于空闲内存延迟的减少和信道带宽利用率的提高。我们还发现，从AMB预取和软件缓存预取获得的性能收益之间只有很小的重叠。预计平均节电15%

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2007 IEEE International Symposium on Performance Analysis of Systems & Software

自引率

0.00%

发文量