不同抽象范式下的GPU编程效率

ACM Transactions on Computing Education (TOCE) Pub Date : 2020-10-13 DOI:10.1145/3418301

P. Daleiden, A. Stefik, Philip Merlin Uesbeck

{"title":"不同抽象范式下的GPU编程效率","authors":"P. Daleiden, A. Stefik, Philip Merlin Uesbeck","doi":"10.1145/3418301","DOIUrl":null,"url":null,"abstract":"Coprocessor architectures in High Performance Computing are prevalent in today’s scientific computing clusters and require specialized knowledge for proper utilization. Various alternative paradigms for parallel and offload computation exist, but little is known about the human factors impacts of using the different paradigms. With computer science student participants from the University of Nevada, Las Vegas with no previous exposure to Graphics Processing Unit programming, our study compared NVIDIA CUDA C/C++ as a control group and the Thrust library. The designers of Thrust claim their higher level of abstraction enhances programmer productivity. The trial was conducted on 91 participants and was administered through our computerized testing platform. Although the study was narrowly focused on the basic steps of an offloaded computation problem and was not intended to be a comprehensive evaluation of the superiority of one approach or the other, we found evidence that although Thrust was designed for ease of use, the abstractions tended to be confusing to students and in several cases diminished productivity. Specifically, abstractions in Thrust for (i) memory allocation through a C++ Standard Template Library-style vector library call, (ii) memory transfers between the host and Graphics Processing Unit coprocessor through an overloaded assignment operator, and (iii) execution of an offloaded routine through a generic transform library call instead of a CUDA kernel routine all performed either equal to or worse than CUDA.","PeriodicalId":352564,"journal":{"name":"ACM Transactions on Computing Education (TOCE)","volume":"84 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"GPU Programming Productivity in Different Abstraction Paradigms\",\"authors\":\"P. Daleiden, A. Stefik, Philip Merlin Uesbeck\",\"doi\":\"10.1145/3418301\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Coprocessor architectures in High Performance Computing are prevalent in today’s scientific computing clusters and require specialized knowledge for proper utilization. Various alternative paradigms for parallel and offload computation exist, but little is known about the human factors impacts of using the different paradigms. With computer science student participants from the University of Nevada, Las Vegas with no previous exposure to Graphics Processing Unit programming, our study compared NVIDIA CUDA C/C++ as a control group and the Thrust library. The designers of Thrust claim their higher level of abstraction enhances programmer productivity. The trial was conducted on 91 participants and was administered through our computerized testing platform. Although the study was narrowly focused on the basic steps of an offloaded computation problem and was not intended to be a comprehensive evaluation of the superiority of one approach or the other, we found evidence that although Thrust was designed for ease of use, the abstractions tended to be confusing to students and in several cases diminished productivity. Specifically, abstractions in Thrust for (i) memory allocation through a C++ Standard Template Library-style vector library call, (ii) memory transfers between the host and Graphics Processing Unit coprocessor through an overloaded assignment operator, and (iii) execution of an offloaded routine through a generic transform library call instead of a CUDA kernel routine all performed either equal to or worse than CUDA.\",\"PeriodicalId\":352564,\"journal\":{\"name\":\"ACM Transactions on Computing Education (TOCE)\",\"volume\":\"84 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Computing Education (TOCE)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3418301\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Computing Education (TOCE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3418301","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

高性能计算中的协处理器架构在当今的科学计算集群中非常普遍，需要专业知识才能正确使用。并行和卸载计算有多种可供选择的范式，但人们对使用不同范式的人为因素影响知之甚少。我们的研究对象是来自内华达州拉斯维加斯大学的计算机科学专业的学生，他们以前没有接触过图形处理单元编程，我们的研究比较了NVIDIA CUDA C/ c++作为对照组和Thrust库。Thrust的设计者声称他们更高层次的抽象可以提高程序员的工作效率。该试验共有91名参与者，并通过我们的计算机化测试平台进行管理。虽然这项研究仅仅集中在卸载计算问题的基本步骤上，并没有打算对一种方法或另一种方法的优越性进行全面的评估，但我们发现，尽管Thrust是为易于使用而设计的，但其抽象概念往往会让学生感到困惑，并且在一些情况下会降低生产力。具体来说，Thrust中的抽象(i)通过c++标准模板库风格的向量库调用分配内存，(ii)通过重载赋值运算符在主机和图形处理单元协处理器之间传输内存，以及(iii)通过通用转换库调用执行卸载例程，而不是CUDA内核例程，所有这些执行都等于或比CUDA更差。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GPU Programming Productivity in Different Abstraction Paradigms

Coprocessor architectures in High Performance Computing are prevalent in today’s scientific computing clusters and require specialized knowledge for proper utilization. Various alternative paradigms for parallel and offload computation exist, but little is known about the human factors impacts of using the different paradigms. With computer science student participants from the University of Nevada, Las Vegas with no previous exposure to Graphics Processing Unit programming, our study compared NVIDIA CUDA C/C++ as a control group and the Thrust library. The designers of Thrust claim their higher level of abstraction enhances programmer productivity. The trial was conducted on 91 participants and was administered through our computerized testing platform. Although the study was narrowly focused on the basic steps of an offloaded computation problem and was not intended to be a comprehensive evaluation of the superiority of one approach or the other, we found evidence that although Thrust was designed for ease of use, the abstractions tended to be confusing to students and in several cases diminished productivity. Specifically, abstractions in Thrust for (i) memory allocation through a C++ Standard Template Library-style vector library call, (ii) memory transfers between the host and Graphics Processing Unit coprocessor through an overloaded assignment operator, and (iii) execution of an offloaded routine through a generic transform library call instead of a CUDA kernel routine all performed either equal to or worse than CUDA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Computing Education (TOCE)

自引率

0.00%

发文量