K. Huck, Xingfu Wu, Anshu Dubey, Antigoni Georgiadou, J. A. Harris, T. Klosterman, Matthew Trappett, K. Weide
{"title":"Performance Debugging and Tuning of Flash-X with Data Analysis Tools","authors":"K. Huck, Xingfu Wu, Anshu Dubey, Antigoni Georgiadou, J. A. Harris, T. Klosterman, Matthew Trappett, K. Weide","doi":"10.1109/ProTools56701.2022.00009","DOIUrl":null,"url":null,"abstract":"State-of-the-art multiphysics simulations running on large scale leadership computing platforms have many variables contributing to their performance and scaling behavior. We recently encountered an interesting performance anomaly in Flash-X, a multiphysics multicomponent simulation software, when characterizing its performance behavior on several large-scale HPC platforms. The anomaly was tracked down to the interaction between the use of dynamic allocation of scratch data and data locality in the cache hierarchy. In this paper we present the details of unexpected performance variability of Flash-X, its extensive analysis using the performance measurement tool TAU to collect the data and Python data analysis libraries to explore the data, and our insights from this experience. In this process, we discovered and removed or mitigated two additional performance limiting bottlenecks for performance tuning.","PeriodicalId":193850,"journal":{"name":"2022 IEEE/ACM Workshop on Programming and Performance Visualization Tools (ProTools)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM Workshop on Programming and Performance Visualization Tools (ProTools)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ProTools56701.2022.00009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
State-of-the-art multiphysics simulations running on large scale leadership computing platforms have many variables contributing to their performance and scaling behavior. We recently encountered an interesting performance anomaly in Flash-X, a multiphysics multicomponent simulation software, when characterizing its performance behavior on several large-scale HPC platforms. The anomaly was tracked down to the interaction between the use of dynamic allocation of scratch data and data locality in the cache hierarchy. In this paper we present the details of unexpected performance variability of Flash-X, its extensive analysis using the performance measurement tool TAU to collect the data and Python data analysis libraries to explore the data, and our insights from this experience. In this process, we discovered and removed or mitigated two additional performance limiting bottlenecks for performance tuning.