基于多维数组的多核/GPGPU便携式计算内核

2011 IEEE International Conference on Cluster Computing Pub Date : 2011-09-26 DOI:10.1109/CLUSTER.2011.47

H. C. Edwards, Daniel Sunderland, Chris Amsler, Elec Eng, Dept, Sam P. Mish

{"title":"基于多维数组的多核/GPGPU便携式计算内核","authors":"H. C. Edwards, Daniel Sunderland, Chris Amsler, Elec Eng, Dept, Sam P. Mish","doi":"10.1109/CLUSTER.2011.47","DOIUrl":null,"url":null,"abstract":"Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern many core accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Trilinos-Kokkos array programming model provides library based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) there exists one or more many core compute devices each with its own memory space, (2) data parallel kernels are executed via parallel for and parallel reduce operations, and (3) kernels operate on multidimensional arrays. Kernel execution performance is, especially for NVIDIA R GPGPU devices, extremely dependent on data access patterns. An optimal data access pattern can be different for different many core devices -- potentially leading to different implementations of computational kernels specialized for different devices. The Trilinos-Kokkos programming model support performance-portable kernels by separating data access patterns from computational kernels through a multidimensional array API. Through this API device-specific mappings of multiindices to device memory are introduced into a computational kernel through compile-time polymorphism, i.e., without modification of the kernel.","PeriodicalId":200830,"journal":{"name":"2011 IEEE International Conference on Cluster Computing","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-09-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Multicore/GPGPU Portable Computational Kernels via Multidimensional Arrays\",\"authors\":\"H. C. Edwards, Daniel Sunderland, Chris Amsler, Elec Eng, Dept, Sam P. Mish\",\"doi\":\"10.1109/CLUSTER.2011.47\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern many core accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Trilinos-Kokkos array programming model provides library based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) there exists one or more many core compute devices each with its own memory space, (2) data parallel kernels are executed via parallel for and parallel reduce operations, and (3) kernels operate on multidimensional arrays. Kernel execution performance is, especially for NVIDIA R GPGPU devices, extremely dependent on data access patterns. An optimal data access pattern can be different for different many core devices -- potentially leading to different implementations of computational kernels specialized for different devices. The Trilinos-Kokkos programming model support performance-portable kernels by separating data access patterns from computational kernels through a multidimensional array API. Through this API device-specific mappings of multiindices to device memory are introduced into a computational kernel through compile-time polymorphism, i.e., without modification of the kernel.\",\"PeriodicalId\":200830,\"journal\":{\"name\":\"2011 IEEE International Conference on Cluster Computing\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-09-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 IEEE International Conference on Cluster Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CLUSTER.2011.47\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE International Conference on Cluster Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CLUSTER.2011.47","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

大型、复杂的科学和工程应用程序代码在计算核上进行了大量投资，以实现它们的数学模型。将这些计算内核移植到现代许多核心加速器设备的集合中是一项主要挑战，因为这些设备具有不同的编程模型、应用程序编程接口(api)和性能要求。Trilinos-Kokkos数组编程模型提供了基于库的方法来实现计算内核，这些计算内核的性能可移植到cpu多核和GPGPU加速器设备上。这种编程模型基于三个基本概念:(1)存在一个或多个核心计算设备，每个设备都有自己的内存空间;(2)数据并行内核通过并行for和并行reduce操作来执行;(3)内核在多维数组上操作。内核执行性能，特别是对于NVIDIA R GPGPU设备，极度依赖于数据访问模式。对于不同的许多核心设备，最佳数据访问模式可能是不同的——这可能导致针对不同设备的专用计算内核的不同实现。Trilinos-Kokkos编程模型通过多维数组API将数据访问模式与计算内核分离，从而支持性能可移植的内核。通过这个API，通过编译时多态性(即不修改内核)将特定于设备的多索引到设备内存的映射引入计算内核。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multicore/GPGPU Portable Computational Kernels via Multidimensional Arrays

Large, complex scientific and engineering application code have a significant investment in computational kernels to implement their mathematical models. Porting these computational kernels to the collection of modern many core accelerator devices is a major challenge in that these devices have diverse programming models, application programming interfaces (APIs), and performance requirements. The Trilinos-Kokkos array programming model provides library based approach to implement computational kernels that are performance-portable to CPU-multicore and GPGPU accelerator devices. This programming model is based upon three fundamental concepts: (1) there exists one or more many core compute devices each with its own memory space, (2) data parallel kernels are executed via parallel for and parallel reduce operations, and (3) kernels operate on multidimensional arrays. Kernel execution performance is, especially for NVIDIA R GPGPU devices, extremely dependent on data access patterns. An optimal data access pattern can be different for different many core devices -- potentially leading to different implementations of computational kernels specialized for different devices. The Trilinos-Kokkos programming model support performance-portable kernels by separating data access patterns from computational kernels through a multidimensional array API. Through this API device-specific mappings of multiindices to device memory are introduced into a computational kernel through compile-time polymorphism, i.e., without modification of the kernel.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2011 IEEE International Conference on Cluster Computing

自引率

0.00%

发文量