Improving system throughput and fairness simultaneously in shared memory CMP systems via Dynamic Bank Partitioning

2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2014-06-19 DOI:10.1109/HPCA.2014.6835945

Mingli Xie, Dong Tong, Kan Huang, Xu Cheng

{"title":"Improving system throughput and fairness simultaneously in shared memory CMP systems via Dynamic Bank Partitioning","authors":"Mingli Xie, Dong Tong, Kan Huang, Xu Cheng","doi":"10.1109/HPCA.2014.6835945","DOIUrl":null,"url":null,"abstract":"Applications running concurrently in CMP systems interfere with each other at DRAM memory, leading to poor system performance and fairness. Memory access scheduling reorders memory requests to improve system throughput and fairness. However, it cannot resolve the interference issue effectively. To reduce interference, memory partitioning divides memory resource among threads. Memory channel partitioning maps the data of threads that are likely to severely interfere with each other to different channels. However, it allocates memory resource unfairly and physically exacerbates memory contention of intensive threads, thus ultimately resulting in the increased slowdown of these threads and high system unfairness. Bank partitioning divides memory banks among cores and eliminates interference. However, previous equal bank partitioning restricts the number of banks available to individual thread and reduces bank level parallelism. In this paper, we first propose a Dynamic Bank Partitioning (DBP), which partitions memory banks according to threads' requirements for bank amounts. DBP compensates for the reduced bank level parallelism caused by equal bank partitioning. The key principle is to profile threads' memory characteristics at run-time and estimate their demands for bank amount, then use the estimation to direct our bank partitioning. Second, we observe that bank partitioning and memory scheduling are orthogonal in the sense; both methods can be illuminated when they are applied together. Therefore, we present a comprehensive approach which integrates Dynamic Bank Partitioning and Thread Cluster Memory scheduling (DBP-TCM, TCM is one of the best memory scheduling) to further improve system performance. Experimental results show that the proposed DBP improves system performance by 4.3% and improves system fairness by 16% over equal bank partitioning. Compared to TCM, DBP-TCM improves system throughput by 6.2% and fairness by 16.7%. When compared with MCP, DBP-TCM provides 5.3% better system throughput and 37% better system fairness. We conclude that our methods are effective in improving both system throughput and fairness.","PeriodicalId":164587,"journal":{"name":"2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)","volume":"97 4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"62","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2014.6835945","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 62

Abstract

Applications running concurrently in CMP systems interfere with each other at DRAM memory, leading to poor system performance and fairness. Memory access scheduling reorders memory requests to improve system throughput and fairness. However, it cannot resolve the interference issue effectively. To reduce interference, memory partitioning divides memory resource among threads. Memory channel partitioning maps the data of threads that are likely to severely interfere with each other to different channels. However, it allocates memory resource unfairly and physically exacerbates memory contention of intensive threads, thus ultimately resulting in the increased slowdown of these threads and high system unfairness. Bank partitioning divides memory banks among cores and eliminates interference. However, previous equal bank partitioning restricts the number of banks available to individual thread and reduces bank level parallelism. In this paper, we first propose a Dynamic Bank Partitioning (DBP), which partitions memory banks according to threads' requirements for bank amounts. DBP compensates for the reduced bank level parallelism caused by equal bank partitioning. The key principle is to profile threads' memory characteristics at run-time and estimate their demands for bank amount, then use the estimation to direct our bank partitioning. Second, we observe that bank partitioning and memory scheduling are orthogonal in the sense; both methods can be illuminated when they are applied together. Therefore, we present a comprehensive approach which integrates Dynamic Bank Partitioning and Thread Cluster Memory scheduling (DBP-TCM, TCM is one of the best memory scheduling) to further improve system performance. Experimental results show that the proposed DBP improves system performance by 4.3% and improves system fairness by 16% over equal bank partitioning. Compared to TCM, DBP-TCM improves system throughput by 6.2% and fairness by 16.7%. When compared with MCP, DBP-TCM provides 5.3% better system throughput and 37% better system fairness. We conclude that our methods are effective in improving both system throughput and fairness.

查看原文本刊更多论文

通过动态银行分区提高共享内存CMP系统的吞吐量和公平性

在CMP系统中并发运行的应用程序在DRAM内存中相互干扰，导致系统性能和公平性较差。内存访问调度重新排序内存请求，以提高系统吞吐量和公平性。然而，它不能有效地解决干扰问题。为了减少干扰，内存分区在线程之间划分内存资源。内存通道分区将可能严重相互干扰的线程的数据映射到不同的通道。但是，它不公平地分配内存资源，并且在物理上加剧了密集线程的内存争用，从而最终导致这些线程的速度增加和系统不公平。银行分区将内存银行划分在不同的核之间，消除了干扰。但是，以前的相等银行分区限制了单个线程可用的银行数量，并降低了银行级别的并行性。在本文中，我们首先提出了动态银行分区(DBP)，它根据线程对银行数量的需求对内存银行进行分区。DBP补偿了相等的银行分区所导致的银行级并行性的降低。关键原则是在运行时分析线程的内存特征，并估计它们对银行数量的需求，然后使用估计来指导我们的银行分区。其次，我们观察到银行分区和内存调度在某种意义上是正交的;当这两种方法一起应用时，它们可以被照亮。因此，我们提出了一种综合的方法，将动态银行分区和线程集群内存调度(DBP-TCM, TCM是最好的内存调度之一)相结合，以进一步提高系统性能。实验结果表明，与等银行分区相比，DBP算法的系统性能提高了4.3%，系统公平性提高了16%。与TCM相比，DBP-TCM系统吞吐量提高了6.2%，公平性提高了16.7%。与MCP相比，DBP-TCM提供了5.3%的系统吞吐量和37%的系统公平性。我们得出结论，我们的方法在提高系统吞吐量和公平性方面都是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量