Task-parallel Runtime System Optimization Using Static Compiler Analysis

Proceedings of the Computing Frontiers Conference Pub Date : 2017-05-15 DOI:10.1145/3075564.3075574

Peter Thoman, P. Zangerl, T. Fahringer

{"title":"Task-parallel Runtime System Optimization Using Static Compiler Analysis","authors":"Peter Thoman, P. Zangerl, T. Fahringer","doi":"10.1145/3075564.3075574","DOIUrl":null,"url":null,"abstract":"Achieving high performance in task-parallel runtime systems, especially with high degrees of parallelism and fine-grained tasks, requires tuning a large variety of behavioral parameters according to program characteristics. In the current state of the art, this tuning is generally performed in one of two ways: either by a group of experts who derive a single setup which achieves good -- but not optimal -- performance across a wide variety of use cases, or by monitoring a system's behavior at runtime and responding to it. The former approach invariably fails to achieve optimal performance for programs with highly distinct execution patterns, while the latter induces some overhead and cannot affect parameters which need to be fixed at compile time. In order to mitigate these drawbacks, we propose a set of novel static compiler analyses specifically designed to determine program features which affect the optimal settings for a task-parallel execution environment. These features include the parallel structure of task spawning, the granularity of individual tasks, and an estimate of the stack size required per task. Based on the result of these analyses, various runtime system parameters are then tuned at compile time. We have implemented this approach in the Insieme compiler and runtime system, and evaluated its effectiveness on a set of 12 task parallel benchmarks running with 1 to 64 hardware threads. Across this entire space of use cases, our implementation achieves a geometric mean performance improvement of 39%.","PeriodicalId":398898,"journal":{"name":"Proceedings of the Computing Frontiers Conference","volume":"127 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-05-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Computing Frontiers Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3075564.3075574","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

Abstract

Achieving high performance in task-parallel runtime systems, especially with high degrees of parallelism and fine-grained tasks, requires tuning a large variety of behavioral parameters according to program characteristics. In the current state of the art, this tuning is generally performed in one of two ways: either by a group of experts who derive a single setup which achieves good -- but not optimal -- performance across a wide variety of use cases, or by monitoring a system's behavior at runtime and responding to it. The former approach invariably fails to achieve optimal performance for programs with highly distinct execution patterns, while the latter induces some overhead and cannot affect parameters which need to be fixed at compile time. In order to mitigate these drawbacks, we propose a set of novel static compiler analyses specifically designed to determine program features which affect the optimal settings for a task-parallel execution environment. These features include the parallel structure of task spawning, the granularity of individual tasks, and an estimate of the stack size required per task. Based on the result of these analyses, various runtime system parameters are then tuned at compile time. We have implemented this approach in the Insieme compiler and runtime system, and evaluated its effectiveness on a set of 12 task parallel benchmarks running with 1 to 64 hardware threads. Across this entire space of use cases, our implementation achieves a geometric mean performance improvement of 39%.

查看原文本刊更多论文

使用静态编译器分析的任务并行运行时系统优化

在任务并行运行时系统中实现高性能，特别是具有高度并行性和细粒度任务的系统，需要根据程序特征调优大量的行为参数。在目前的技术水平下，这种调优通常以以下两种方式之一执行:要么由一组专家得出一个单一的设置，该设置可以在各种各样的用例中实现良好的(但不是最优的)性能，要么通过在运行时监视系统的行为并对其进行响应。对于具有高度不同执行模式的程序，前一种方法总是无法实现最佳性能，而后一种方法会带来一些开销，并且不会影响需要在编译时修复的参数。为了减轻这些缺点，我们提出了一套新的静态编译器分析，专门用于确定影响任务并行执行环境的最佳设置的程序特性。这些特性包括任务生成的并行结构、单个任务的粒度以及每个任务所需的堆栈大小的估计。根据这些分析的结果，然后在编译时调优各种运行时系统参数。我们已经在Insieme编译器和运行时系统中实现了这种方法，并在使用1到64个硬件线程运行的12个任务并行基准测试中评估了它的有效性。在整个用例空间中，我们的实现实现了39%的几何平均性能改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Computing Frontiers Conference

自引率

0.00%

发文量