Autotuning multigrid with PetaBricks

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis Pub Date : 2009-11-14 DOI:10.1145/1654059.1654065

Cy P. Chan, Jason Ansel, Y. Wong, Saman P. Amarasinghe, A. Edelman

{"title":"Autotuning multigrid with PetaBricks","authors":"Cy P. Chan, Jason Ansel, Y. Wong, Saman P. Amarasinghe, A. Edelman","doi":"10.1145/1654059.1654065","DOIUrl":null,"url":null,"abstract":"Algorithmic choice is essential in any problem domain to realizing optimal computational performance. Multigrid is a prime example: not only is it possible to make choices at the highest grid resolution, but a program can switch techniques as the problem is recursively attacked on coarser grid levels to take advantage of algorithms with different scaling behaviors. Additionally, users with different convergence criteria must experiment with parameters to yield a tuned algorithm that meets their accuracy requirements. Even after a tuned algorithm has been found, users often have to start all over when migrating from one machine to another. We present an algorithm and autotuning methodology that address these issues in a near-optimal and efficient manner. The freedom of independently tuning both the algorithm and the number of iterations at each recursion level results in an exponential search space of tuned algorithms that have different accuracies and performances. To search this space efficiently, our autotuner utilizes a novel dynamic programming method to build efficient tuned algorithms from the bottom up. The results are customized multigrid algorithms that invest targeted computational power to yield the accuracy required by the user. The techniques we describe allow the user to automatically generate tuned multigrid cycles of different shapes targeted to the user's specific combination of problem, hardware, and accuracy requirements. These cycle shapes dictate the order in which grid coarsening and grid refinement are interleaved with both iterative methods, such as Jacobi or Successive Over-Relaxation, as well as direct methods, which tend to have superior performance for small problem sizes. The need to make choices between all of these methods brings the issue of variable accuracy to the forefront. Not only must the autotuning framework compare different possible multigrid cycle shapes against each other, but it also needs the ability to compare tuned cycles against both direct and (non-multigrid) iterative methods. We address this problem by using an accuracy metric for measuring the effectiveness of tuned cycle shapes and making comparisons over all algorithmic types based on this common yardstick. In our results, we find that the flexibility to trade performance versus accuracy at all levels of recursive computation enables us to achieve excellent performance on a variety of platforms compared to algorithmically static implementations of multigrid. Our implementation uses PetaBricks, an implicitly parallel programming language where algorithmic choices are exposed in the language. The PetaBricks compiler uses these choices to analyze, autotune, and verify the PetaBricks program. These language features, most notably the autotuner, were key in enabling our implementation to be clear, correct, and fast.","PeriodicalId":371415,"journal":{"name":"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis","volume":"17 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-11-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"35","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1654059.1654065","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 35

Abstract

Algorithmic choice is essential in any problem domain to realizing optimal computational performance. Multigrid is a prime example: not only is it possible to make choices at the highest grid resolution, but a program can switch techniques as the problem is recursively attacked on coarser grid levels to take advantage of algorithms with different scaling behaviors. Additionally, users with different convergence criteria must experiment with parameters to yield a tuned algorithm that meets their accuracy requirements. Even after a tuned algorithm has been found, users often have to start all over when migrating from one machine to another. We present an algorithm and autotuning methodology that address these issues in a near-optimal and efficient manner. The freedom of independently tuning both the algorithm and the number of iterations at each recursion level results in an exponential search space of tuned algorithms that have different accuracies and performances. To search this space efficiently, our autotuner utilizes a novel dynamic programming method to build efficient tuned algorithms from the bottom up. The results are customized multigrid algorithms that invest targeted computational power to yield the accuracy required by the user. The techniques we describe allow the user to automatically generate tuned multigrid cycles of different shapes targeted to the user's specific combination of problem, hardware, and accuracy requirements. These cycle shapes dictate the order in which grid coarsening and grid refinement are interleaved with both iterative methods, such as Jacobi or Successive Over-Relaxation, as well as direct methods, which tend to have superior performance for small problem sizes. The need to make choices between all of these methods brings the issue of variable accuracy to the forefront. Not only must the autotuning framework compare different possible multigrid cycle shapes against each other, but it also needs the ability to compare tuned cycles against both direct and (non-multigrid) iterative methods. We address this problem by using an accuracy metric for measuring the effectiveness of tuned cycle shapes and making comparisons over all algorithmic types based on this common yardstick. In our results, we find that the flexibility to trade performance versus accuracy at all levels of recursive computation enables us to achieve excellent performance on a variety of platforms compared to algorithmically static implementations of multigrid. Our implementation uses PetaBricks, an implicitly parallel programming language where algorithmic choices are exposed in the language. The PetaBricks compiler uses these choices to analyze, autotune, and verify the PetaBricks program. These language features, most notably the autotuner, were key in enabling our implementation to be clear, correct, and fast.

查看原文本刊更多论文

使用PetaBricks自动调整多网格

在任何问题领域中，算法选择都是实现最优计算性能的关键。Multigrid就是一个典型的例子:它不仅可以在最高的网格分辨率下做出选择，而且当问题在更粗的网格级别上被递归攻击时，程序可以切换技术，以利用具有不同缩放行为的算法。此外，具有不同收敛标准的用户必须对参数进行实验，以产生满足其精度要求的优化算法。即使找到了经过调优的算法，用户在从一台机器迁移到另一台机器时也常常不得不从头开始。我们提出了一种算法和自动调整方法，以接近最优和有效的方式解决这些问题。独立调优算法和每个递归级别的迭代次数的自由导致调优算法的指数搜索空间具有不同的精度和性能。为了有效地搜索这个空间，我们的自动调谐器采用了一种新颖的动态规划方法，从下至上构建高效的调谐算法。结果是定制的多网格算法，投入目标计算能力以产生用户所需的准确性。我们描述的技术允许用户自动生成不同形状的多网格循环，针对用户的特定问题、硬件和精度要求组合。这些循环形状决定了网格粗化和网格精化与迭代方法(如Jacobi或连续过度松弛)以及直接方法(对于小问题规模往往具有优越的性能)交织在一起的顺序。需要在所有这些方法之间做出选择，这使得可变精度问题成为最重要的问题。自动调优框架不仅必须相互比较不同可能的多网格循环形状，而且还需要能够将调优周期与直接方法和(非多网格)迭代方法进行比较。我们通过使用测量调谐周期形状有效性的精度度量来解决这个问题，并基于这个通用尺度对所有算法类型进行比较。在我们的研究结果中，我们发现，在所有递归计算级别上权衡性能与准确性的灵活性使我们能够在各种平台上获得与多网格算法静态实现相比的出色性能。我们的实现使用PetaBricks，这是一种隐式并行编程语言，其中算法选择在语言中公开。PetaBricks编译器使用这些选项来分析、自动调优和验证PetaBricks程序。这些语言特性，尤其是自动调谐器，是使我们的实现清晰、正确和快速的关键。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis

自引率

0.00%

发文量