Increasing multicore system efficiency through intelligent bandwidth shifting

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2015-03-09 DOI:10.1109/HPCA.2015.7056020

Víctor Jiménez, A. Buyuktosunoglu, P. Bose, F. O'Connell, F. Cazorla, M. Valero

{"title":"Increasing multicore system efficiency through intelligent bandwidth shifting","authors":"Víctor Jiménez, A. Buyuktosunoglu, P. Bose, F. O'Connell, F. Cazorla, M. Valero","doi":"10.1109/HPCA.2015.7056020","DOIUrl":null,"url":null,"abstract":"Memory bandwidth is a crucial resource in computing systems. Current CMP/SMT processors have a significant number of cores and they can run many threads concurrently. This large thread count adds high pressure to the memory bus, which demands high bandwidth to service memory requests from the cores. Hardware data prefetching is a well-known technique for hiding memory latency. Due to its speculative nature, however, in some situations prefetching does not effectively work, wasting memory bandwidth and polluting the caches. Data prefetching efficiency depends on the prefetching algorithm. It also depends on the characteristics of the applications running on the system. In this paper we propose an online bandwidth shifting mechanism that dynamically assigns bandwidth to applications according to their prefetch efficiency. This mechanism maximizes the utilization of memory bandwidth, thereby improving system performance and/or reducing memory power consumption. To the best of our knowledge, this solution is the first to not require hardware support. We evaluate the benefits of using our bandwidth shifting mechanism on a real system - the IBM POWER7. We obtain speedups in the order of 10-20% (in one instance, speedup exceeds 1.6X). Our mechanism does not generate a significant degree of unfairness among the applications. In many cases individual thread performance increases by 10-35%, while virtually no thread experiences a slowdown larger than 5%.","PeriodicalId":6593,"journal":{"name":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","volume":"55 1","pages":"39-50"},"PeriodicalIF":0.0000,"publicationDate":"2015-03-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"21","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2015.7056020","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 21

Abstract

Memory bandwidth is a crucial resource in computing systems. Current CMP/SMT processors have a significant number of cores and they can run many threads concurrently. This large thread count adds high pressure to the memory bus, which demands high bandwidth to service memory requests from the cores. Hardware data prefetching is a well-known technique for hiding memory latency. Due to its speculative nature, however, in some situations prefetching does not effectively work, wasting memory bandwidth and polluting the caches. Data prefetching efficiency depends on the prefetching algorithm. It also depends on the characteristics of the applications running on the system. In this paper we propose an online bandwidth shifting mechanism that dynamically assigns bandwidth to applications according to their prefetch efficiency. This mechanism maximizes the utilization of memory bandwidth, thereby improving system performance and/or reducing memory power consumption. To the best of our knowledge, this solution is the first to not require hardware support. We evaluate the benefits of using our bandwidth shifting mechanism on a real system - the IBM POWER7. We obtain speedups in the order of 10-20% (in one instance, speedup exceeds 1.6X). Our mechanism does not generate a significant degree of unfairness among the applications. In many cases individual thread performance increases by 10-35%, while virtually no thread experiences a slowdown larger than 5%.

查看原文本刊更多论文

通过智能带宽转移提高多核系统效率

内存带宽是计算系统中至关重要的资源。当前的CMP/SMT处理器有大量的内核，它们可以并发地运行多个线程。这么大的线程数给内存总线增加了很大的压力，它需要高带宽来服务来自内核的内存请求。硬件数据预取是一种众所周知的隐藏内存延迟的技术。然而，由于其推测性，在某些情况下，预取不能有效地工作，从而浪费内存带宽并污染缓存。数据预取的效率取决于预取算法。它还取决于系统上运行的应用程序的特征。本文提出了一种在线带宽转移机制，根据应用程序的预取效率动态分配带宽。这种机制可以最大限度地利用内存带宽，从而提高系统性能和/或降低内存功耗。据我们所知，这个解决方案是第一个不需要硬件支持的解决方案。我们评估了在实际系统(IBM POWER7)上使用我们的带宽转移机制的好处。我们获得了10-20%的加速(在一个实例中，加速超过1.6倍)。我们的机制不会在应用程序之间产生很大程度的不公平。在许多情况下，单个线程的性能提高了10-35%，而实际上没有线程的减速超过5%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量