A statistical performance prediction model for OpenCL kernels on NVIDIA GPUs

The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013) Pub Date : 2013-10-01 DOI:10.1109/CADS.2013.6714232

Ali Karami, S. A. Mirsoleimani, F. Khunjush

{"title":"A statistical performance prediction model for OpenCL kernels on NVIDIA GPUs","authors":"Ali Karami, S. A. Mirsoleimani, F. Khunjush","doi":"10.1109/CADS.2013.6714232","DOIUrl":null,"url":null,"abstract":"Understanding performance bottlenecks of applications in high performance computing can lead to dramatic improvements of applications performances. For example, a key problem in GPU programming is finding performance bottlenecks and solving them to reach the best possible performance. These bottlenecks in GPU architectures span a variety of factors such as memory access latency, branch divergence, utilization, and the amount of existing parallelism. In addition, a simple profiling cannot demonstrate the relations between these bottlenecks. In this paper, we propose a statistical performance model that not only helps us find bottlenecks but also shows the relations between them which is not possible by using a profiler. The OpenCL programming standard can be used in a variety of platforms (e.g., CPUs and GPUs); therefore, a program written in one platform can be imported to other platforms with minimal effort. As a result, we selected the OpenCL programming standard in order to design our performance model for NVIDIA GPUs. For this, we first measure the values of a GPU performance counters for the selected benchmarks. Then, using the achieved results and applying a regression model and the principle component analysis we develop a model to show how different GPU parameters account for applications performance bottlenecks. Our results show that the proposed model can predict applications behaviors with a 91% accuracy. Moreover, the proposed model is able to characterize unknown applications based on their performance similarities with an existing database of benchmark to predict their likely performance bottlenecks.","PeriodicalId":379673,"journal":{"name":"The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)","volume":"80 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CADS.2013.6714232","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 30

Abstract

Understanding performance bottlenecks of applications in high performance computing can lead to dramatic improvements of applications performances. For example, a key problem in GPU programming is finding performance bottlenecks and solving them to reach the best possible performance. These bottlenecks in GPU architectures span a variety of factors such as memory access latency, branch divergence, utilization, and the amount of existing parallelism. In addition, a simple profiling cannot demonstrate the relations between these bottlenecks. In this paper, we propose a statistical performance model that not only helps us find bottlenecks but also shows the relations between them which is not possible by using a profiler. The OpenCL programming standard can be used in a variety of platforms (e.g., CPUs and GPUs); therefore, a program written in one platform can be imported to other platforms with minimal effort. As a result, we selected the OpenCL programming standard in order to design our performance model for NVIDIA GPUs. For this, we first measure the values of a GPU performance counters for the selected benchmarks. Then, using the achieved results and applying a regression model and the principle component analysis we develop a model to show how different GPU parameters account for applications performance bottlenecks. Our results show that the proposed model can predict applications behaviors with a 91% accuracy. Moreover, the proposed model is able to characterize unknown applications based on their performance similarities with an existing database of benchmark to predict their likely performance bottlenecks.

查看原文本刊更多论文

NVIDIA gpu上OpenCL内核的统计性能预测模型

了解高性能计算中应用程序的性能瓶颈可以显著提高应用程序的性能。例如，GPU编程中的一个关键问题是找到性能瓶颈并解决它们以达到最佳性能。GPU架构中的这些瓶颈跨越了各种因素，如内存访问延迟、分支分歧、利用率和现有并行性的数量。此外，简单的概要分析无法演示这些瓶颈之间的关系。在本文中，我们提出了一个统计性能模型，它不仅可以帮助我们找到瓶颈，还可以显示它们之间的关系，这是使用分析器无法做到的。OpenCL编程标准可用于各种平台(例如，cpu和gpu);因此，在一个平台上编写的程序可以以最小的努力导入到其他平台。因此，为了设计NVIDIA gpu的性能模型，我们选择了OpenCL编程标准。为此，我们首先测量所选基准的GPU性能计数器的值。然后，利用所获得的结果并应用回归模型和主成分分析，我们开发了一个模型来显示不同GPU参数如何影响应用程序性能瓶颈。结果表明，该模型预测应用程序行为的准确率为91%。此外，所提出的模型能够根据未知应用程序与现有基准数据库的性能相似性来描述未知应用程序，以预测其可能的性能瓶颈。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013)

自引率

0.00%

发文量