Application Aware DRAM Bank Partitioning in CMP

2013 International Conference on Parallel and Distributed Systems Pub Date : 2013-12-15 DOI:10.1109/ICPADS.2013.56

Takakazu Ikeda, Kenji Kise

{"title":"Application Aware DRAM Bank Partitioning in CMP","authors":"Takakazu Ikeda, Kenji Kise","doi":"10.1109/ICPADS.2013.56","DOIUrl":null,"url":null,"abstract":"Main memory is a shared resource among cores in a chip and the speed gap between cores and main memory limits the total system performance. Thus, main memory should be effectively accessed by each core. Exploiting both parallelism and locality of main memory is the key to realize the efficient memory access. The parallelism between memory banks can hide the latency by pipelining memory accesses. The locality of memory accesses improves hit ratio of the row buffer in DRAM chips. The state-of-the-art method called bpart is proposed to improve memory access efficiency. In bpart one bank is monopolized by one thread and this monopolization improves row buffer locality because of alleviating inter-thread interference. However, bpart is not effective for the thread which has poor locality. Moreover, the bank level parallelism is not exploited. We propose the new bank partitioning method which exploits parallelism in addition to locality. Our method applies the two types of bank usage. One usage is that low locality threads share banks to improve parallelism, and the other usage is that each high locality thread monopolizes each bank to improve row buffer locality. We evaluate our proposed method by our in-house software simulator with SPEC CPU 2006 benchmark. On Average, system throughput is increased by 1.0% and minimum speedup (fairness metrics) is increased by 7.9% relative to bpart. This result shows that our porposed method has better performance and fairness than bpart.","PeriodicalId":160979,"journal":{"name":"2013 International Conference on Parallel and Distributed Systems","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 International Conference on Parallel and Distributed Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPADS.2013.56","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

Abstract

Main memory is a shared resource among cores in a chip and the speed gap between cores and main memory limits the total system performance. Thus, main memory should be effectively accessed by each core. Exploiting both parallelism and locality of main memory is the key to realize the efficient memory access. The parallelism between memory banks can hide the latency by pipelining memory accesses. The locality of memory accesses improves hit ratio of the row buffer in DRAM chips. The state-of-the-art method called bpart is proposed to improve memory access efficiency. In bpart one bank is monopolized by one thread and this monopolization improves row buffer locality because of alleviating inter-thread interference. However, bpart is not effective for the thread which has poor locality. Moreover, the bank level parallelism is not exploited. We propose the new bank partitioning method which exploits parallelism in addition to locality. Our method applies the two types of bank usage. One usage is that low locality threads share banks to improve parallelism, and the other usage is that each high locality thread monopolizes each bank to improve row buffer locality. We evaluate our proposed method by our in-house software simulator with SPEC CPU 2006 benchmark. On Average, system throughput is increased by 1.0% and minimum speedup (fairness metrics) is increased by 7.9% relative to bpart. This result shows that our porposed method has better performance and fairness than bpart.

查看原文本刊更多论文

CMP中应用感知的DRAM组分区

主存是芯片中内核之间的共享资源，内核和主存之间的速度差距限制了系统的整体性能。因此，每个核都应该有效地访问主存。利用主存的并行性和局部性是实现高效内存访问的关键。内存库之间的并行性可以通过管道化内存访问来隐藏延迟。存储器访问的局部性提高了DRAM芯片中行缓冲区的命中率。为了提高内存访问效率，提出了一种称为bpart的最先进的方法。在bpart中，一个银行被一个线程垄断，这种垄断由于减轻了线程间的干扰而提高了行缓冲区的局部性。然而，对于局部性差的线程，bpart是无效的。此外，没有利用银行级别的并行性。我们提出了一种利用并行性和局部性的银行分区方法。我们的方法适用于两种类型的银行用法。一种用法是低局部性线程共享银行以提高并行性，另一种用法是每个高局部性线程垄断每个银行以提高行缓冲区局部性。我们用我们的内部软件模拟器和speccpu2006基准测试来评估我们提出的方法。平均而言，系统吞吐量相对于bpart增加了1.0%，最小加速(公平性指标)增加了7.9%。结果表明，该方法比bpart具有更好的性能和公平性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 International Conference on Parallel and Distributed Systems

自引率

0.00%

发文量