便携式并行程序的两种动态性能调优方法

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing Pub Date : 1995-04-19 DOI:10.1109/ICAPP.1995.472244

K. Suzaki, T. Kurita, H. Tanuma, S. Hirano, Y. Ichisugi

{"title":"便携式并行程序的两种动态性能调优方法","authors":"K. Suzaki, T. Kurita, H. Tanuma, S. Hirano, Y. Ichisugi","doi":"10.1109/ICAPP.1995.472244","DOIUrl":null,"url":null,"abstract":"We present two dynamic performance tuning methods for portable parallel programs on various parallel computers. In parallel programs the affinity between parallel algorithms and the architecture of the target parallel computer is very important. In this paper we focus on the parallelism in view of the number of micro-tasks which are processing units in parallel programs. The presented methods estimate the optimal number of micro-tasks before the parallel processing is invoked. Furthermore, they shorten the execution time of the parallel program so that it is close to the optimal execution time. The estimation is based on the result of pre-executions of the program for different sizes of the data to be processed on a target parallel computer. One tuning method uses nearest-neighbor interpolation and the other uses spline interpolation for the estimation. We tested these tuning methods using a parallel square-matrix multiplication program written in Dataparallel C on three different parallel computers; a Paragon, an iPSC/2, and an nCUBE/2. In these experiments, the method using nearest-neighbor interpolation brought the execution time closer to the optimum than did the method using spline interpolation. The nearest-neighbor interpolation method yielded average execution times, which are given in terms of the optimal execution time, of 1.01 for the Paragon, 1.005 for the iPSC/2, and 1.052 for the nCUBE/2.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"33 3-4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Two dynamic performance tuning methods for portable parallel programs\",\"authors\":\"K. Suzaki, T. Kurita, H. Tanuma, S. Hirano, Y. Ichisugi\",\"doi\":\"10.1109/ICAPP.1995.472244\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present two dynamic performance tuning methods for portable parallel programs on various parallel computers. In parallel programs the affinity between parallel algorithms and the architecture of the target parallel computer is very important. In this paper we focus on the parallelism in view of the number of micro-tasks which are processing units in parallel programs. The presented methods estimate the optimal number of micro-tasks before the parallel processing is invoked. Furthermore, they shorten the execution time of the parallel program so that it is close to the optimal execution time. The estimation is based on the result of pre-executions of the program for different sizes of the data to be processed on a target parallel computer. One tuning method uses nearest-neighbor interpolation and the other uses spline interpolation for the estimation. We tested these tuning methods using a parallel square-matrix multiplication program written in Dataparallel C on three different parallel computers; a Paragon, an iPSC/2, and an nCUBE/2. In these experiments, the method using nearest-neighbor interpolation brought the execution time closer to the optimum than did the method using spline interpolation. The nearest-neighbor interpolation method yielded average execution times, which are given in terms of the optimal execution time, of 1.01 for the Paragon, 1.005 for the iPSC/2, and 1.052 for the nCUBE/2.<<ETX>>\",\"PeriodicalId\":448130,\"journal\":{\"name\":\"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing\",\"volume\":\"33 3-4\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAPP.1995.472244\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAPP.1995.472244","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

提出了两种便携式并行程序在各种并行计算机上的动态性能调优方法。在并行程序中，并行算法与目标并行计算机的结构之间的亲和性是非常重要的。本文从并行程序中作为处理单元的微任务的数量出发，对并行性进行了研究。提出的方法在并行处理被调用之前估计微任务的最优数量。此外，它们缩短了并行程序的执行时间，使其接近最优执行时间。该估计是基于在目标并行计算机上处理的不同大小的数据的程序预执行的结果。一种调谐方法使用最近邻插值，另一种使用样条插值进行估计。我们在三台不同的并行计算机上使用datparallelc编写的并行方阵乘法程序测试了这些调优方法;一个Paragon，一个iPSC/2和一个nCUBE/2。在这些实验中，使用最近邻插值的方法比使用样条插值的方法使执行时间更接近最优。最近邻插值方法产生的平均执行时间为Paragon的1.01,iPSC/2的1.005和nCUBE/2的1.052，以最优执行时间给出。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Two dynamic performance tuning methods for portable parallel programs

We present two dynamic performance tuning methods for portable parallel programs on various parallel computers. In parallel programs the affinity between parallel algorithms and the architecture of the target parallel computer is very important. In this paper we focus on the parallelism in view of the number of micro-tasks which are processing units in parallel programs. The presented methods estimate the optimal number of micro-tasks before the parallel processing is invoked. Furthermore, they shorten the execution time of the parallel program so that it is close to the optimal execution time. The estimation is based on the result of pre-executions of the program for different sizes of the data to be processed on a target parallel computer. One tuning method uses nearest-neighbor interpolation and the other uses spline interpolation for the estimation. We tested these tuning methods using a parallel square-matrix multiplication program written in Dataparallel C on three different parallel computers; a Paragon, an iPSC/2, and an nCUBE/2. In these experiments, the method using nearest-neighbor interpolation brought the execution time closer to the optimum than did the method using spline interpolation. The nearest-neighbor interpolation method yielded average execution times, which are given in terms of the optimal execution time, of 1.01 for the Paragon, 1.005 for the iPSC/2, and 1.052 for the nCUBE/2.<>

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing

自引率

0.00%

发文量