Sudheer Chunduri, K. Harms, Scott Parker, V. Morozov, Samuel Oshin, N. Cherukuri, Kalyan Kumaran
{"title":"Run-to-run Variability on Xeon Phi based Cray XC Systems","authors":"Sudheer Chunduri, K. Harms, Scott Parker, V. Morozov, Samuel Oshin, N. Cherukuri, Kalyan Kumaran","doi":"10.1145/3126908.3126926","DOIUrl":null,"url":null,"abstract":"The increasing complexity of HPC systems has introduced new sources of variability, which can contribute to significant differences in run-to-run performance of applications. With components at various levels of the system contributing variability, application developers and system users are now faced with the difficult task of running and tuning their applications in an environment where run-to-run performance measurements can vary by as much as a factor of two to three. In this study, we classify, quantify, and present ways to mitigate the sources of run-to-run variability on Cray XC systems with Intel Xeon Phi processors and a dragonfly interconnect. We further demonstrate that the code-tuning performance observed in a variability-mitigating environment correlates with the performance observed in production running conditions. CCS CONCEPTS • General and reference $\\rightarrow$ Performance; • Networks $\\rightarrow$ Network performance analysis; • Hardware $\\longrightarrow$ Process, voltage and temperature variations;","PeriodicalId":204241,"journal":{"name":"SC17: International Conference for High Performance Computing, Networking, Storage and Analysis","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"60","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"SC17: International Conference for High Performance Computing, Networking, Storage and Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3126908.3126926","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 60
Abstract
The increasing complexity of HPC systems has introduced new sources of variability, which can contribute to significant differences in run-to-run performance of applications. With components at various levels of the system contributing variability, application developers and system users are now faced with the difficult task of running and tuning their applications in an environment where run-to-run performance measurements can vary by as much as a factor of two to three. In this study, we classify, quantify, and present ways to mitigate the sources of run-to-run variability on Cray XC systems with Intel Xeon Phi processors and a dragonfly interconnect. We further demonstrate that the code-tuning performance observed in a variability-mitigating environment correlates with the performance observed in production running conditions. CCS CONCEPTS • General and reference $\rightarrow$ Performance; • Networks $\rightarrow$ Network performance analysis; • Hardware $\longrightarrow$ Process, voltage and temperature variations;