Data handling inefficiencies between CUDA, 3D rendering, and system memory

Brian Gordon, S. Sohoni, D. Chandler
{"title":"Data handling inefficiencies between CUDA, 3D rendering, and system memory","authors":"Brian Gordon, S. Sohoni, D. Chandler","doi":"10.1109/IISWC.2010.5648828","DOIUrl":null,"url":null,"abstract":"While GPGPU programming offers faster computation of highly parallelized code, the memory bandwidth between the system and the GPU can create a bottleneck that reduces the potential gains. CUDA is a prominent GPGPU API which can transfer data to and from system code, and which can also access data used by 3D rendering APIs. In an application that relies on both GPU programming APIs to accelerate 3D modeling and an easily parallelized algorithm, the hidden inefficiencies of nVidia's data handling with CUDA become apparent. First, CUDA uses the CPU's store units to copy data between the graphics card and system memory instead of using a more efficient method like DMA. Second, data exchanged between the two GPU-based APIs travels through the main processor instead of staying on the GPU. As a result, a non-GPGPU implementation of a program runs faster than the same program using GPGPU.","PeriodicalId":107589,"journal":{"name":"IEEE International Symposium on Workload Characterization (IISWC'10)","volume":"17 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on Workload Characterization (IISWC'10)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IISWC.2010.5648828","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

While GPGPU programming offers faster computation of highly parallelized code, the memory bandwidth between the system and the GPU can create a bottleneck that reduces the potential gains. CUDA is a prominent GPGPU API which can transfer data to and from system code, and which can also access data used by 3D rendering APIs. In an application that relies on both GPU programming APIs to accelerate 3D modeling and an easily parallelized algorithm, the hidden inefficiencies of nVidia's data handling with CUDA become apparent. First, CUDA uses the CPU's store units to copy data between the graphics card and system memory instead of using a more efficient method like DMA. Second, data exchanged between the two GPU-based APIs travels through the main processor instead of staying on the GPU. As a result, a non-GPGPU implementation of a program runs faster than the same program using GPGPU.
CUDA、3D渲染和系统内存之间的数据处理效率低下
虽然GPGPU编程为高度并行化的代码提供了更快的计算速度,但系统和GPU之间的内存带宽可能会造成瓶颈,从而降低潜在的收益。CUDA是一个突出的GPGPU API,它可以在系统代码之间传输数据,也可以访问3D渲染API使用的数据。在一个既依赖GPU编程api来加速3D建模又依赖易于并行化的算法的应用程序中,nVidia使用CUDA处理数据的隐藏效率低下变得显而易见。首先,CUDA使用CPU的存储单元在显卡和系统内存之间复制数据,而不是使用像DMA这样更有效的方法。其次,两个基于GPU的api之间交换的数据通过主处理器而不是停留在GPU上。因此,一个程序的非GPGPU实现比使用GPGPU的相同程序运行得更快。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信