Two dynamic performance tuning methods for portable parallel programs

K. Suzaki, T. Kurita, H. Tanuma, S. Hirano, Y. Ichisugi
{"title":"Two dynamic performance tuning methods for portable parallel programs","authors":"K. Suzaki, T. Kurita, H. Tanuma, S. Hirano, Y. Ichisugi","doi":"10.1109/ICAPP.1995.472244","DOIUrl":null,"url":null,"abstract":"We present two dynamic performance tuning methods for portable parallel programs on various parallel computers. In parallel programs the affinity between parallel algorithms and the architecture of the target parallel computer is very important. In this paper we focus on the parallelism in view of the number of micro-tasks which are processing units in parallel programs. The presented methods estimate the optimal number of micro-tasks before the parallel processing is invoked. Furthermore, they shorten the execution time of the parallel program so that it is close to the optimal execution time. The estimation is based on the result of pre-executions of the program for different sizes of the data to be processed on a target parallel computer. One tuning method uses nearest-neighbor interpolation and the other uses spline interpolation for the estimation. We tested these tuning methods using a parallel square-matrix multiplication program written in Dataparallel C on three different parallel computers; a Paragon, an iPSC/2, and an nCUBE/2. In these experiments, the method using nearest-neighbor interpolation brought the execution time closer to the optimum than did the method using spline interpolation. The nearest-neighbor interpolation method yielded average execution times, which are given in terms of the optimal execution time, of 1.01 for the Paragon, 1.005 for the iPSC/2, and 1.052 for the nCUBE/2.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"33 3-4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAPP.1995.472244","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We present two dynamic performance tuning methods for portable parallel programs on various parallel computers. In parallel programs the affinity between parallel algorithms and the architecture of the target parallel computer is very important. In this paper we focus on the parallelism in view of the number of micro-tasks which are processing units in parallel programs. The presented methods estimate the optimal number of micro-tasks before the parallel processing is invoked. Furthermore, they shorten the execution time of the parallel program so that it is close to the optimal execution time. The estimation is based on the result of pre-executions of the program for different sizes of the data to be processed on a target parallel computer. One tuning method uses nearest-neighbor interpolation and the other uses spline interpolation for the estimation. We tested these tuning methods using a parallel square-matrix multiplication program written in Dataparallel C on three different parallel computers; a Paragon, an iPSC/2, and an nCUBE/2. In these experiments, the method using nearest-neighbor interpolation brought the execution time closer to the optimum than did the method using spline interpolation. The nearest-neighbor interpolation method yielded average execution times, which are given in terms of the optimal execution time, of 1.01 for the Paragon, 1.005 for the iPSC/2, and 1.052 for the nCUBE/2.<>
便携式并行程序的两种动态性能调优方法
提出了两种便携式并行程序在各种并行计算机上的动态性能调优方法。在并行程序中,并行算法与目标并行计算机的结构之间的亲和性是非常重要的。本文从并行程序中作为处理单元的微任务的数量出发,对并行性进行了研究。提出的方法在并行处理被调用之前估计微任务的最优数量。此外,它们缩短了并行程序的执行时间,使其接近最优执行时间。该估计是基于在目标并行计算机上处理的不同大小的数据的程序预执行的结果。一种调谐方法使用最近邻插值,另一种使用样条插值进行估计。我们在三台不同的并行计算机上使用datparallelc编写的并行方阵乘法程序测试了这些调优方法;一个Paragon,一个iPSC/2和一个nCUBE/2。在这些实验中,使用最近邻插值的方法比使用样条插值的方法使执行时间更接近最优。最近邻插值方法产生的平均执行时间为Paragon的1.01,iPSC/2的1.005和nCUBE/2的1.052,以最优执行时间给出。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信