{"title":"Inferring Large-Scale Computation Behavior via Trace Extrapolation","authors":"L. Carrington, M. Laurenzano, Ananta Tiwari","doi":"10.1109/IPDPSW.2013.137","DOIUrl":null,"url":null,"abstract":"Understanding large-scale application behavior is critical for effectively utilizing existing HPC resources and making design decisions for upcoming systems. In this work we present a methodology for characterizing an MPI application's large-scale computation behavior and system requirements using information about the behavior of that application at a series of smaller core counts. The methodology finds the best statistical fit from among a set of canonical functions in terms of how a set of features that are important for both performance and energy (cache hit rates, floating point intensity, ILP, etc.) change across a series of small core counts. The statistical models for each of these application features can then be utilized to generate an extrapolated trace of the application at scale. The fidelity of the fully extrapolated traces is evaluated by comparing the results of building performance models using both the extrapolated trace along with an actual trace in order to predict application performance at using each. For two full-scale HPC applications, SPECFEM3D and UH3D, the extrapolated traces had absolute relative errors of less than 5%.","PeriodicalId":234552,"journal":{"name":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2013.137","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19
Abstract
Understanding large-scale application behavior is critical for effectively utilizing existing HPC resources and making design decisions for upcoming systems. In this work we present a methodology for characterizing an MPI application's large-scale computation behavior and system requirements using information about the behavior of that application at a series of smaller core counts. The methodology finds the best statistical fit from among a set of canonical functions in terms of how a set of features that are important for both performance and energy (cache hit rates, floating point intensity, ILP, etc.) change across a series of small core counts. The statistical models for each of these application features can then be utilized to generate an extrapolated trace of the application at scale. The fidelity of the fully extrapolated traces is evaluated by comparing the results of building performance models using both the extrapolated trace along with an actual trace in order to predict application performance at using each. For two full-scale HPC applications, SPECFEM3D and UH3D, the extrapolated traces had absolute relative errors of less than 5%.