A case for NUMA-aware contention management on multicore systems

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT) Pub Date : 2010-09-11 DOI:10.1145/1854273.1854350

S. Blagodurov, Sergey Zhuravlev, Mohammad Dashti, Alexandra Fedorova

{"title":"A case for NUMA-aware contention management on multicore systems","authors":"S. Blagodurov, Sergey Zhuravlev, Mohammad Dashti, Alexandra Fedorova","doi":"10.1145/1854273.1854350","DOIUrl":null,"url":null,"abstract":"On multicore systems contention for shared resources occurs when memory-intensive threads are co-scheduled on cores that share parts of the memory hierarchy, such as lastlevel caches and memory controllers. Previous work investigated how contention could be addressed via scheduling. A contention-aware scheduler separates competing threads onto separate memory hierarchy domains to eliminate resource sharing and, as a consequence, mitigate contention. However, all previous work on contention-aware scheduling assumed that the underlying system is UMA (uniform memory access latencies, single memory controller). Modern multicore systems, however, are NUMA, which means that they feature non-uniform memory access latencies and multiple memory controllers. We discovered that contention management is a lot more difficult on NUMA systems, because the scheduler must not only consider the placement of threads, but also the placement of their memory. This is mostly required to eliminate contention for memory controllers contrary to the popular belief that remote access latency is the dominant concern. In this work we quantify the effects on performance imposed by resource contention and remote access latency. This analysis inspires the design of a contention-aware scheduling algorithm for NUMA systems. This algorithm significantly outperforms a NUMA-unaware algorithm proposed before as well as the default Linux scheduler. We also investigate memory migration strategies, which are the necessary part of the NUMA contention-aware scheduling algorithm. Finally, we propose and evaluate a new contention management algorithm that is priority-aware.","PeriodicalId":422461,"journal":{"name":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"291","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1854273.1854350","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 291

Abstract

On multicore systems contention for shared resources occurs when memory-intensive threads are co-scheduled on cores that share parts of the memory hierarchy, such as lastlevel caches and memory controllers. Previous work investigated how contention could be addressed via scheduling. A contention-aware scheduler separates competing threads onto separate memory hierarchy domains to eliminate resource sharing and, as a consequence, mitigate contention. However, all previous work on contention-aware scheduling assumed that the underlying system is UMA (uniform memory access latencies, single memory controller). Modern multicore systems, however, are NUMA, which means that they feature non-uniform memory access latencies and multiple memory controllers. We discovered that contention management is a lot more difficult on NUMA systems, because the scheduler must not only consider the placement of threads, but also the placement of their memory. This is mostly required to eliminate contention for memory controllers contrary to the popular belief that remote access latency is the dominant concern. In this work we quantify the effects on performance imposed by resource contention and remote access latency. This analysis inspires the design of a contention-aware scheduling algorithm for NUMA systems. This algorithm significantly outperforms a NUMA-unaware algorithm proposed before as well as the default Linux scheduler. We also investigate memory migration strategies, which are the necessary part of the NUMA contention-aware scheduling algorithm. Finally, we propose and evaluate a new contention management algorithm that is priority-aware.

查看原文本刊更多论文

多核系统上numa感知争用管理的一个案例

在多核系统上，当内存密集型线程在共享部分内存层次结构(如最后一级缓存和内存控制器)的内核上共同调度时，就会发生共享资源争用。以前的工作研究了如何通过调度来解决争用问题。感知争用的调度器将竞争线程分离到单独的内存层次结构域中，以消除资源共享，从而减轻争用。然而，之前所有关于争用感知调度的工作都假定底层系统是UMA(统一内存访问延迟，单个内存控制器)。然而，现代多核系统是NUMA的，这意味着它们具有非统一的内存访问延迟和多个内存控制器。我们发现，在NUMA系统上，争用管理要困难得多，因为调度器不仅要考虑线程的位置，还要考虑它们的内存位置。这主要是为了消除对内存控制器的争用，这与认为远程访问延迟是主要问题的流行观点相反。在这项工作中，我们量化了资源争用和远程访问延迟对性能的影响。这一分析启发了NUMA系统竞争感知调度算法的设计。该算法明显优于之前提出的numa - aware算法以及默认的Linux调度器。我们还研究了内存迁移策略，这是NUMA竞争感知调度算法的必要组成部分。最后，我们提出并评估了一种新的优先级感知争用管理算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT)

自引率

0.00%

发文量