K. Suzaki, T. Kurita, H. Tanuma, S. Hirano, Y. Ichisugi
{"title":"便携式并行程序的两种动态性能调优方法","authors":"K. Suzaki, T. Kurita, H. Tanuma, S. Hirano, Y. Ichisugi","doi":"10.1109/ICAPP.1995.472244","DOIUrl":null,"url":null,"abstract":"We present two dynamic performance tuning methods for portable parallel programs on various parallel computers. In parallel programs the affinity between parallel algorithms and the architecture of the target parallel computer is very important. In this paper we focus on the parallelism in view of the number of micro-tasks which are processing units in parallel programs. The presented methods estimate the optimal number of micro-tasks before the parallel processing is invoked. Furthermore, they shorten the execution time of the parallel program so that it is close to the optimal execution time. The estimation is based on the result of pre-executions of the program for different sizes of the data to be processed on a target parallel computer. One tuning method uses nearest-neighbor interpolation and the other uses spline interpolation for the estimation. We tested these tuning methods using a parallel square-matrix multiplication program written in Dataparallel C on three different parallel computers; a Paragon, an iPSC/2, and an nCUBE/2. In these experiments, the method using nearest-neighbor interpolation brought the execution time closer to the optimum than did the method using spline interpolation. The nearest-neighbor interpolation method yielded average execution times, which are given in terms of the optimal execution time, of 1.01 for the Paragon, 1.005 for the iPSC/2, and 1.052 for the nCUBE/2.<<ETX>>","PeriodicalId":448130,"journal":{"name":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","volume":"33 3-4","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1995-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Two dynamic performance tuning methods for portable parallel programs\",\"authors\":\"K. Suzaki, T. Kurita, H. Tanuma, S. Hirano, Y. Ichisugi\",\"doi\":\"10.1109/ICAPP.1995.472244\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We present two dynamic performance tuning methods for portable parallel programs on various parallel computers. In parallel programs the affinity between parallel algorithms and the architecture of the target parallel computer is very important. In this paper we focus on the parallelism in view of the number of micro-tasks which are processing units in parallel programs. The presented methods estimate the optimal number of micro-tasks before the parallel processing is invoked. Furthermore, they shorten the execution time of the parallel program so that it is close to the optimal execution time. The estimation is based on the result of pre-executions of the program for different sizes of the data to be processed on a target parallel computer. One tuning method uses nearest-neighbor interpolation and the other uses spline interpolation for the estimation. We tested these tuning methods using a parallel square-matrix multiplication program written in Dataparallel C on three different parallel computers; a Paragon, an iPSC/2, and an nCUBE/2. In these experiments, the method using nearest-neighbor interpolation brought the execution time closer to the optimum than did the method using spline interpolation. The nearest-neighbor interpolation method yielded average execution times, which are given in terms of the optimal execution time, of 1.01 for the Paragon, 1.005 for the iPSC/2, and 1.052 for the nCUBE/2.<<ETX>>\",\"PeriodicalId\":448130,\"journal\":{\"name\":\"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing\",\"volume\":\"33 3-4\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1995-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICAPP.1995.472244\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 1st International Conference on Algorithms and Architectures for Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICAPP.1995.472244","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Two dynamic performance tuning methods for portable parallel programs
We present two dynamic performance tuning methods for portable parallel programs on various parallel computers. In parallel programs the affinity between parallel algorithms and the architecture of the target parallel computer is very important. In this paper we focus on the parallelism in view of the number of micro-tasks which are processing units in parallel programs. The presented methods estimate the optimal number of micro-tasks before the parallel processing is invoked. Furthermore, they shorten the execution time of the parallel program so that it is close to the optimal execution time. The estimation is based on the result of pre-executions of the program for different sizes of the data to be processed on a target parallel computer. One tuning method uses nearest-neighbor interpolation and the other uses spline interpolation for the estimation. We tested these tuning methods using a parallel square-matrix multiplication program written in Dataparallel C on three different parallel computers; a Paragon, an iPSC/2, and an nCUBE/2. In these experiments, the method using nearest-neighbor interpolation brought the execution time closer to the optimum than did the method using spline interpolation. The nearest-neighbor interpolation method yielded average execution times, which are given in terms of the optimal execution time, of 1.01 for the Paragon, 1.005 for the iPSC/2, and 1.052 for the nCUBE/2.<>