Performance Evaluation and Optimization Mechanisms for Inter-operable Graphics and Computation on GPUs

Proceedings of Workshop on General Purpose Processing Using GPUs Pub Date : 2014-03-01 DOI:10.1145/2588768.2576784

Yash Ukidave, Xiang Gong, D. Kaeli

{"title":"Performance Evaluation and Optimization Mechanisms for Inter-operable Graphics and Computation on GPUs","authors":"Yash Ukidave, Xiang Gong, D. Kaeli","doi":"10.1145/2588768.2576784","DOIUrl":null,"url":null,"abstract":"Graphics Processing Units (GPUs) have gained recognition as the primary form of accelerators for graphics rendering in the gaming domain. They have also been widely accepted as the computing platform of choice in many scientific and high performance computing domains. The parallelism offered by the GPUs is used for simultaneous processing of compute and graphics by applications belonging to a range of domains. The availability of programming standards such as OpenCL and OpenGL has been leveraged to achieve the compute-graphics interoperability in the same application. However, given the increasing demands in both compute and graphics for emerging scientific visualization and immersive gaming applications, degradation in efficiency can be seen due to the continual switching between compute/graphics, swapping in and out of their associated runtime environments. We need to better understand how to tune this interoperable environment in order to allow compute and graphics to run both efficiently and simultaneously. Presently we evaluate each of these domains in isolation. In this paper, we evaluate the performance and efficiency of the OpenCL-OpenGL(CL-GL) interoperability(interop) mode. We explore different methods to improve the execution performance of the CL-GL interop-based applications. We propose a slot-based rendering mechanism for CL-GL interop to increase the efficiency of the application. To evaluate CL-GL and our slot-based scheme, we study five scientific applications using OpenCL and OpenGL for compute and graphics rendering. Our study covers two AMD Radeon discrete GPUs and one shared memory AMD APU as test platforms. We demonstrate that leveraging the CL-GL interop interface results in a 2.2X performance increase, and our slot-based rendering provides 60% increase in performance by providing a 24% improvement in L2 cache hit rate on GPUs and APUs.","PeriodicalId":394600,"journal":{"name":"Proceedings of Workshop on General Purpose Processing Using GPUs","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of Workshop on General Purpose Processing Using GPUs","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2588768.2576784","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Graphics Processing Units (GPUs) have gained recognition as the primary form of accelerators for graphics rendering in the gaming domain. They have also been widely accepted as the computing platform of choice in many scientific and high performance computing domains. The parallelism offered by the GPUs is used for simultaneous processing of compute and graphics by applications belonging to a range of domains. The availability of programming standards such as OpenCL and OpenGL has been leveraged to achieve the compute-graphics interoperability in the same application. However, given the increasing demands in both compute and graphics for emerging scientific visualization and immersive gaming applications, degradation in efficiency can be seen due to the continual switching between compute/graphics, swapping in and out of their associated runtime environments. We need to better understand how to tune this interoperable environment in order to allow compute and graphics to run both efficiently and simultaneously. Presently we evaluate each of these domains in isolation. In this paper, we evaluate the performance and efficiency of the OpenCL-OpenGL(CL-GL) interoperability(interop) mode. We explore different methods to improve the execution performance of the CL-GL interop-based applications. We propose a slot-based rendering mechanism for CL-GL interop to increase the efficiency of the application. To evaluate CL-GL and our slot-based scheme, we study five scientific applications using OpenCL and OpenGL for compute and graphics rendering. Our study covers two AMD Radeon discrete GPUs and one shared memory AMD APU as test platforms. We demonstrate that leveraging the CL-GL interop interface results in a 2.2X performance increase, and our slot-based rendering provides 60% increase in performance by providing a 24% improvement in L2 cache hit rate on GPUs and APUs.

查看原文本刊更多论文

gpu上可互操作图形和计算的性能评估和优化机制

图形处理单元(gpu)已经被公认为游戏领域中图形渲染的主要加速形式。它们也被广泛接受为许多科学和高性能计算领域的首选计算平台。gpu提供的并行性用于属于一系列领域的应用程序同时处理计算和图形。像OpenCL和OpenGL这样的编程标准的可用性已经被用来在同一个应用程序中实现计算图形的互操作性。然而，考虑到新兴的科学可视化和沉浸式游戏应用程序对计算和图形的需求不断增加，由于在计算/图形之间不断切换，在其相关的运行时环境中切换，可以看到效率的下降。我们需要更好地理解如何调整这个可互操作的环境，以允许计算和图形同时高效地运行。目前，我们分别对这些领域进行评估。本文对OpenCL-OpenGL(CL-GL)互操作(interop)模式的性能和效率进行了评估。我们探索了不同的方法来提高基于CL-GL互操作的应用程序的执行性能。为了提高应用程序的效率，我们提出了一种基于插槽的CL-GL互操作渲染机制。为了评估CL-GL和我们的基于插槽的方案，我们研究了使用OpenCL和OpenGL进行计算和图形渲染的五个科学应用。我们的研究涵盖了两个AMD Radeon分立gpu和一个共享内存AMD APU作为测试平台。我们证明，利用CL-GL互操作接口可以使性能提高2.2倍，我们基于插槽的渲染通过在gpu和apu上提供24%的L2缓存命中率来提高60%的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of Workshop on General Purpose Processing Using GPUs

自引率

0.00%

发文量