Modeling and predicting application performance on parallel computers using HPC challenge benchmarks

2008 IEEE International Symposium on Parallel and Distributed Processing Pub Date : 2008-04-14 DOI:10.1109/IPDPS.2008.4536278

W. Pfeiffer, N. Wright

{"title":"Modeling and predicting application performance on parallel computers using HPC challenge benchmarks","authors":"W. Pfeiffer, N. Wright","doi":"10.1109/IPDPS.2008.4536278","DOIUrl":null,"url":null,"abstract":"A method is presented for modeling application performance on parallel computers in terms of the performance of microkernels from the HPC Challenge benchmarks. Specifically, the application run time is expressed as a linear combination of inverse speeds and latencies from microkernels or system characteristics. The model parameters are obtained by an automated series of least squares fits using backward elimination to ensure statistical significance. If necessary, outliers are deleted to ensure that the final fit is robust. Typically three or four terms appear in each model: at most one each for floating-point speed, memory bandwidth, interconnect bandwidth, and interconnect latency. Such models allow prediction of application performance on future computers from easier-to-make predictions of microkernel performance. The method was used to build models for four benchmark problems involving the PARATEC and MILC scientific applications. These models not only describe performance well on the ten computers used to build the models, but also do a good job of predicting performance on three additional computers with newer design features. For the four application benchmark problems with six predictions each, the relative root mean squared error in the predicted run times varies between 13 and 16%. The method was also used to build models for the HPL and G-FFTE benchmarks in HPCC, including functional dependences on problem size and core count from complexity analysis. The model for HPL predicts performance even better than the application models do, while the model for G-FFTE systematically underpredicts run times.","PeriodicalId":162608,"journal":{"name":"2008 IEEE International Symposium on Parallel and Distributed Processing","volume":"2016 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2008-04-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2008 IEEE International Symposium on Parallel and Distributed Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS.2008.4536278","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 34

Abstract

A method is presented for modeling application performance on parallel computers in terms of the performance of microkernels from the HPC Challenge benchmarks. Specifically, the application run time is expressed as a linear combination of inverse speeds and latencies from microkernels or system characteristics. The model parameters are obtained by an automated series of least squares fits using backward elimination to ensure statistical significance. If necessary, outliers are deleted to ensure that the final fit is robust. Typically three or four terms appear in each model: at most one each for floating-point speed, memory bandwidth, interconnect bandwidth, and interconnect latency. Such models allow prediction of application performance on future computers from easier-to-make predictions of microkernel performance. The method was used to build models for four benchmark problems involving the PARATEC and MILC scientific applications. These models not only describe performance well on the ten computers used to build the models, but also do a good job of predicting performance on three additional computers with newer design features. For the four application benchmark problems with six predictions each, the relative root mean squared error in the predicted run times varies between 13 and 16%. The method was also used to build models for the HPL and G-FFTE benchmarks in HPCC, including functional dependences on problem size and core count from complexity analysis. The model for HPL predicts performance even better than the application models do, while the model for G-FFTE systematically underpredicts run times.

查看原文本刊更多论文

在使用高性能计算挑战基准的并行计算机上建模和预测应用程序性能

提出了一种基于HPC Challenge基准测试的微内核性能在并行计算机上建模应用程序性能的方法。具体来说，应用程序运行时表示为来自微内核或系统特性的逆速度和延迟的线性组合。模型参数由一系列自动最小二乘拟合获得，采用反向消去法确保统计显著性。如有必要，将删除异常值以确保最终拟合的鲁棒性。通常，每个模型中出现三到四个术语:浮点速度、内存带宽、互连带宽和互连延迟各最多一个。这样的模型可以通过对微内核性能的简单预测来预测未来计算机上的应用程序性能。利用该方法对涉及PARATEC和MILC科学应用的四个基准问题建立了模型。这些模型不仅很好地描述了用于构建模型的10台计算机的性能，而且还很好地预测了另外3台具有较新设计特性的计算机的性能。对于四个应用程序基准问题，每个问题有六个预测，预测运行时间的相对均方根误差在13%到16%之间变化。该方法还用于为HPCC中的HPL和G-FFTE基准构建模型，包括问题大小和从复杂性分析得出的核心数的功能依赖性。HPL模型预测性能甚至比应用程序模型更好，而G-FFTE模型系统地低估了运行时间。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2008 IEEE International Symposium on Parallel and Distributed Processing

自引率

0.00%

发文量