Data handling inefficiencies between CUDA, 3D rendering, and system memory

IEEE International Symposium on Workload Characterization (IISWC'10) Pub Date : 2010-12-02 DOI:10.1109/IISWC.2010.5648828

Brian Gordon, S. Sohoni, D. Chandler

引用次数: 13

Abstract

While GPGPU programming offers faster computation of highly parallelized code, the memory bandwidth between the system and the GPU can create a bottleneck that reduces the potential gains. CUDA is a prominent GPGPU API which can transfer data to and from system code, and which can also access data used by 3D rendering APIs. In an application that relies on both GPU programming APIs to accelerate 3D modeling and an easily parallelized algorithm, the hidden inefficiencies of nVidia's data handling with CUDA become apparent. First, CUDA uses the CPU's store units to copy data between the graphics card and system memory instead of using a more efficient method like DMA. Second, data exchanged between the two GPU-based APIs travels through the main processor instead of staying on the GPU. As a result, a non-GPGPU implementation of a program runs faster than the same program using GPGPU.

查看原文本刊更多论文

CUDA、3D渲染和系统内存之间的数据处理效率低下

虽然GPGPU编程为高度并行化的代码提供了更快的计算速度，但系统和GPU之间的内存带宽可能会造成瓶颈，从而降低潜在的收益。CUDA是一个突出的GPGPU API，它可以在系统代码之间传输数据，也可以访问3D渲染API使用的数据。在一个既依赖GPU编程api来加速3D建模又依赖易于并行化的算法的应用程序中，nVidia使用CUDA处理数据的隐藏效率低下变得显而易见。首先，CUDA使用CPU的存储单元在显卡和系统内存之间复制数据，而不是使用像DMA这样更有效的方法。其次，两个基于GPU的api之间交换的数据通过主处理器而不是停留在GPU上。因此，一个程序的非GPGPU实现比使用GPGPU的相同程序运行得更快。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE International Symposium on Workload Characterization (IISWC'10)

自引率

0.00%

发文量