A. Calotoiu, M. Geisenhofer, F. Kummer, M. Ritter, Jens Weber, T. Hoefler, M. Oberlack, F. Wolf
{"title":"空间发散性能的实证建模","authors":"A. Calotoiu, M. Geisenhofer, F. Kummer, M. Ritter, Jens Weber, T. Hoefler, M. Oberlack, F. Wolf","doi":"10.1109/HUSTProtools51951.2020.00015","DOIUrl":null,"url":null,"abstract":"A common simplification made when modeling the performance of a parallel program is the assumption that the performance behavior of all processes or threads is largely uniform. Empirical performance-modeling tools such as Extra-P exploit this common pattern to make their modeling process more noise resilient, mitigating the effect of outliers by summarizing performance measurements of individual functions across all processes. While the underlying assumption does not equally hold for all applications, knowing the qualitative differences in how the performance of individual processes changes as execution parameters are varied can reveal important performance bottlenecks such as malicious patterns of load imbalance. A challenge for empirical modeling tools, however, arises from the fact that the behavioral class of a process may depend on the process configuration, letting process ranks migrate between classes as the number of processes grows. In this paper, we introduce a novel approach to the problem of modeling of spatially diverging performance based on a certain type of process clustering. We apply our technique to identify a previously unknown performance bottleneck in the BoSSS fluid-dynamics code. Removing it made the code regions in question running up to 20 times and the application as a whole run up to 4.5 times faster.","PeriodicalId":38836,"journal":{"name":"Meta: Avaliacao","volume":"13 1","pages":"71-80"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Empirical Modeling of Spatially Diverging Performance\",\"authors\":\"A. Calotoiu, M. Geisenhofer, F. Kummer, M. Ritter, Jens Weber, T. Hoefler, M. Oberlack, F. Wolf\",\"doi\":\"10.1109/HUSTProtools51951.2020.00015\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A common simplification made when modeling the performance of a parallel program is the assumption that the performance behavior of all processes or threads is largely uniform. Empirical performance-modeling tools such as Extra-P exploit this common pattern to make their modeling process more noise resilient, mitigating the effect of outliers by summarizing performance measurements of individual functions across all processes. While the underlying assumption does not equally hold for all applications, knowing the qualitative differences in how the performance of individual processes changes as execution parameters are varied can reveal important performance bottlenecks such as malicious patterns of load imbalance. A challenge for empirical modeling tools, however, arises from the fact that the behavioral class of a process may depend on the process configuration, letting process ranks migrate between classes as the number of processes grows. In this paper, we introduce a novel approach to the problem of modeling of spatially diverging performance based on a certain type of process clustering. We apply our technique to identify a previously unknown performance bottleneck in the BoSSS fluid-dynamics code. Removing it made the code regions in question running up to 20 times and the application as a whole run up to 4.5 times faster.\",\"PeriodicalId\":38836,\"journal\":{\"name\":\"Meta: Avaliacao\",\"volume\":\"13 1\",\"pages\":\"71-80\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Meta: Avaliacao\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HUSTProtools51951.2020.00015\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Social Sciences\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Meta: Avaliacao","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HUSTProtools51951.2020.00015","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Social Sciences","Score":null,"Total":0}
Empirical Modeling of Spatially Diverging Performance
A common simplification made when modeling the performance of a parallel program is the assumption that the performance behavior of all processes or threads is largely uniform. Empirical performance-modeling tools such as Extra-P exploit this common pattern to make their modeling process more noise resilient, mitigating the effect of outliers by summarizing performance measurements of individual functions across all processes. While the underlying assumption does not equally hold for all applications, knowing the qualitative differences in how the performance of individual processes changes as execution parameters are varied can reveal important performance bottlenecks such as malicious patterns of load imbalance. A challenge for empirical modeling tools, however, arises from the fact that the behavioral class of a process may depend on the process configuration, letting process ranks migrate between classes as the number of processes grows. In this paper, we introduce a novel approach to the problem of modeling of spatially diverging performance based on a certain type of process clustering. We apply our technique to identify a previously unknown performance bottleneck in the BoSSS fluid-dynamics code. Removing it made the code regions in question running up to 20 times and the application as a whole run up to 4.5 times faster.