{"title":"A statistical performance prediction model for OpenCL kernels on NVIDIA GPUs","authors":"Ali Karami, S. A. Mirsoleimani, F. Khunjush","doi":"10.1109/CADS.2013.6714232","DOIUrl":null,"url":null,"abstract":"Understanding performance bottlenecks of applications in high performance computing can lead to dramatic improvements of applications performances. For example, a key problem in GPU programming is finding performance bottlenecks and solving them to reach the best possible performance. These bottlenecks in GPU architectures span a variety of factors such as memory access latency, branch divergence, utilization, and the amount of existing parallelism. In addition, a simple profiling cannot demonstrate the relations between these bottlenecks. In this paper, we propose a statistical performance model that not only helps us find bottlenecks but also shows the relations between them which is not possible by using a profiler. The OpenCL programming standard can be used in a variety of platforms (e.g., CPUs and GPUs); therefore, a program written in one platform can be imported to other platforms with minimal effort. As a result, we selected the OpenCL programming standard in order to design our performance model for NVIDIA GPUs. For this, we first measure the values of a GPU performance counters for the selected benchmarks. Then, using the achieved results and applying a regression model and the principle component analysis we develop a model to show how different GPU parameters account for applications performance bottlenecks. Our results show that the proposed model can predict applications behaviors with a 91% accuracy. Moreover, the proposed model is able to characterize unknown applications based on their performance similarities with an existing database of benchmark to predict their likely performance bottlenecks.","PeriodicalId":379673,"journal":{"name":"The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)","volume":"80 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CADS.2013.6714232","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30
Abstract
Understanding performance bottlenecks of applications in high performance computing can lead to dramatic improvements of applications performances. For example, a key problem in GPU programming is finding performance bottlenecks and solving them to reach the best possible performance. These bottlenecks in GPU architectures span a variety of factors such as memory access latency, branch divergence, utilization, and the amount of existing parallelism. In addition, a simple profiling cannot demonstrate the relations between these bottlenecks. In this paper, we propose a statistical performance model that not only helps us find bottlenecks but also shows the relations between them which is not possible by using a profiler. The OpenCL programming standard can be used in a variety of platforms (e.g., CPUs and GPUs); therefore, a program written in one platform can be imported to other platforms with minimal effort. As a result, we selected the OpenCL programming standard in order to design our performance model for NVIDIA GPUs. For this, we first measure the values of a GPU performance counters for the selected benchmarks. Then, using the achieved results and applying a regression model and the principle component analysis we develop a model to show how different GPU parameters account for applications performance bottlenecks. Our results show that the proposed model can predict applications behaviors with a 91% accuracy. Moreover, the proposed model is able to characterize unknown applications based on their performance similarities with an existing database of benchmark to predict their likely performance bottlenecks.