Artem Chikin, J. N. Amaral, Karim Ali, Ettore Tiotto
{"title":"Toward an Analytical Performance Model to Select between GPU and CPU Execution","authors":"Artem Chikin, J. N. Amaral, Karim Ali, Ettore Tiotto","doi":"10.1109/IPDPSW.2019.00068","DOIUrl":null,"url":null,"abstract":"Automating the device selection in heterogeneous computing platforms requires the modelling of performance both on CPUs and on accelerators. This work argues for the use of a hybrid analytical performance modelling approach is a practical way to build fast and efficient methods to select an appropriate target for a given computation kernel. The target selection problem has been addressed in the literature, however there has been a strong emphasis on building empirical models with machine learning techniques. We argue that the applicability of such solutions is often limited in production systems. This paper focus on the issue of building a selector to decide if an OpenMP loop nest should be executed in a CPU or in a GPU. To this end, it offers a comprehensive comparison evaluation of the difference in GPU kernel performance on devices of multiple generations of architectures. The idea is to underscore the need for accurate analytical performance models and to provide insights in the evolution of GPU accelerators. This work also highlights a drawback of existing approaches to modelling GPU performance — accurate modelling of memory coalescing characteristics. To that end, we examine a novel application of an inter-thread difference analysis that can further improve analytical models. Finally, this work presents an initial study of an OpenMP runtime framework for target-offloading target selection.","PeriodicalId":292054,"journal":{"name":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW.2019.00068","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
Abstract
Automating the device selection in heterogeneous computing platforms requires the modelling of performance both on CPUs and on accelerators. This work argues for the use of a hybrid analytical performance modelling approach is a practical way to build fast and efficient methods to select an appropriate target for a given computation kernel. The target selection problem has been addressed in the literature, however there has been a strong emphasis on building empirical models with machine learning techniques. We argue that the applicability of such solutions is often limited in production systems. This paper focus on the issue of building a selector to decide if an OpenMP loop nest should be executed in a CPU or in a GPU. To this end, it offers a comprehensive comparison evaluation of the difference in GPU kernel performance on devices of multiple generations of architectures. The idea is to underscore the need for accurate analytical performance models and to provide insights in the evolution of GPU accelerators. This work also highlights a drawback of existing approaches to modelling GPU performance — accurate modelling of memory coalescing characteristics. To that end, we examine a novel application of an inter-thread difference analysis that can further improve analytical models. Finally, this work presents an initial study of an OpenMP runtime framework for target-offloading target selection.