Locality-Aware Dynamic Task Graph Scheduling

2017 46th International Conference on Parallel Processing (ICPP) Pub Date : 2017-08-01 DOI:10.1109/ICPP.2017.16

Jordyn C. Maglalang, S. Krishnamoorthy, Kunal Agrawal

{"title":"Locality-Aware Dynamic Task Graph Scheduling","authors":"Jordyn C. Maglalang, S. Krishnamoorthy, Kunal Agrawal","doi":"10.1109/ICPP.2017.16","DOIUrl":null,"url":null,"abstract":"Dynamic task graph schedulers automatically balance work across processor cores by scheduling tasks among available threads while preserving dependences. In this paper, we design NABBITC, a provably efficient dynamic task graph scheduler that accounts for data locality on NUMA systems. NABBITC allows users to assign a color to each task representing the location (e.g., a processor core) that has the most efficient access to data needed during that node's execution. NABBITC then automatically adjusts the scheduling so as to preferentially execute each node at the location that matches its color—leading to better locality because the node is likely to make local rather than remote accesses. At the same time, NABBITC tries to optimize load balance and not add too much overhead compared to the vanilla NABBIT scheduler that does not consider locality. We provide a theoretical analysis that shows that NABBITC does not asymptotically impact the scalability of NABBIT.We evaluated the performance of NABBITC on a suite of benchmarks, including both memory and compute intensive applications. Our experiments indicate that adding locality awareness has a considerable performance advantage compared to the vanilla NABBIT scheduler. Furthermore, we compared NABBITC to both OpenMP tasks and OpenMP loops. For regular applications, OpenMP loops can achieve perfect locality and perfect load balance statically. For these benchmarks, NABBITC has a small performance penalty compared to OpenMP due to its dynamic scheduling strategy. Similarly, for compute intensive applications with course-grained tasks, OpenMP task's centralized scheduler provides the best performance. However, we find that NABBITC provides a good trade-off between data locality and load balance; on memory intensive jobs, it consistently outperforms OpenMP tasks while for irregular jobs where load balancing is important, it outperforms OpenMP loops. Therefore, NABBITC combines the benefits of locality-aware scheduling for regular, memory intensive, applications (the forte of static schedulers such as those in OpenMP) and dynamically adapting to load imbalance in irregular applications (the forte of dynamic schedulers such as Cilk Plus, TBB, and Nabbit).","PeriodicalId":392710,"journal":{"name":"2017 46th International Conference on Parallel Processing (ICPP)","volume":"153 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 46th International Conference on Parallel Processing (ICPP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPP.2017.16","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Dynamic task graph schedulers automatically balance work across processor cores by scheduling tasks among available threads while preserving dependences. In this paper, we design NABBITC, a provably efficient dynamic task graph scheduler that accounts for data locality on NUMA systems. NABBITC allows users to assign a color to each task representing the location (e.g., a processor core) that has the most efficient access to data needed during that node's execution. NABBITC then automatically adjusts the scheduling so as to preferentially execute each node at the location that matches its color—leading to better locality because the node is likely to make local rather than remote accesses. At the same time, NABBITC tries to optimize load balance and not add too much overhead compared to the vanilla NABBIT scheduler that does not consider locality. We provide a theoretical analysis that shows that NABBITC does not asymptotically impact the scalability of NABBIT.We evaluated the performance of NABBITC on a suite of benchmarks, including both memory and compute intensive applications. Our experiments indicate that adding locality awareness has a considerable performance advantage compared to the vanilla NABBIT scheduler. Furthermore, we compared NABBITC to both OpenMP tasks and OpenMP loops. For regular applications, OpenMP loops can achieve perfect locality and perfect load balance statically. For these benchmarks, NABBITC has a small performance penalty compared to OpenMP due to its dynamic scheduling strategy. Similarly, for compute intensive applications with course-grained tasks, OpenMP task's centralized scheduler provides the best performance. However, we find that NABBITC provides a good trade-off between data locality and load balance; on memory intensive jobs, it consistently outperforms OpenMP tasks while for irregular jobs where load balancing is important, it outperforms OpenMP loops. Therefore, NABBITC combines the benefits of locality-aware scheduling for regular, memory intensive, applications (the forte of static schedulers such as those in OpenMP) and dynamically adapting to load imbalance in irregular applications (the forte of dynamic schedulers such as Cilk Plus, TBB, and Nabbit).

查看原文本刊更多论文

位置感知动态任务图调度

动态任务图调度器通过在可用线程之间调度任务，同时保留依赖性，自动平衡处理器内核之间的工作。在本文中，我们设计了NABBITC，一个可证明有效的动态任务图调度程序，它考虑了NUMA系统上的数据局部性。NABBITC允许用户为每个任务分配一种颜色，代表在该节点执行期间最有效地访问所需数据的位置(例如，处理器核心)。然后，NABBITC自动调整调度，以便优先在与其颜色匹配的位置执行每个节点，从而获得更好的局部性，因为节点可能进行本地访问而不是远程访问。与此同时，NABBITC试图优化负载平衡，与不考虑局部性的普通NABBIT调度器相比，不会增加太多的开销。我们提供了一个理论分析，表明NABBITC不会渐近地影响NABBIT的可扩展性。我们在一系列基准测试中评估了NABBITC的性能，包括内存和计算密集型应用程序。我们的实验表明，与普通的NABBIT调度器相比，添加位置感知具有相当大的性能优势。此外，我们将NABBITC与OpenMP任务和OpenMP循环进行了比较。对于常规应用程序，OpenMP循环可以静态地实现完美的局部性和完美的负载平衡。对于这些基准测试，由于其动态调度策略，NABBITC与OpenMP相比具有较小的性能损失。类似地，对于具有细粒度任务的计算密集型应用程序，OpenMP任务的集中式调度器提供了最佳性能。然而，我们发现NABBITC在数据局部性和负载平衡之间提供了一个很好的权衡;对于内存密集型作业，它的性能始终优于OpenMP任务，而对于负载平衡很重要的不规则作业，它的性能优于OpenMP循环。因此，NABBITC结合了常规、内存密集型应用程序的位置感知调度(静态调度器的优势，如OpenMP)和动态适应不规则应用程序的负载不平衡(动态调度器的优势，如Cilk Plus、TBB和Nabbit)的优点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 46th International Conference on Parallel Processing (ICPP)

自引率

0.00%

发文量