Using High Level GPU Tasks to Explore Memory and Communications Options on Heterogeneous Platforms

Proceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications Pub Date : 2017-06-26 DOI:10.1145/3085158.3086160

Chao Liu, J. Bhimani, M. Leeser

{"title":"Using High Level GPU Tasks to Explore Memory and Communications Options on Heterogeneous Platforms","authors":"Chao Liu, J. Bhimani, M. Leeser","doi":"10.1145/3085158.3086160","DOIUrl":null,"url":null,"abstract":"Heterogeneous computing platforms that use GPUs for acceleration are becoming prevalent. Developing parallel applications for GPU platforms and optimizing GPU related applications for good performance is important. In this work, we develop a set of applications based on a high level task design, which ensures a well defined structure for portability improvement. Together with the GPU task implementation, we utilize a uniform interface to allocate and manage memory blocks that are used by both host and device. In this way we can choose the appropriate types of memory for host/device communication easily and flexibly in GPU tasks. Through asynchronous task execution and CUDA streams, we can explore concurrent GPU kernels for performance improvement when running multiple tasks. We developed a test benchmark set containing nine different kernel applications. Through tests we can learn that pinned memory can improve host/device data transfer for GPU platforms. The performance of unified memory differs a lot on different GPU architectures and is not a good choice if performance is the main focus. The multiple task tests show that applications based on our GPU tasks can effectively make use of the concurrent kernel ability of modern GPUs for better resource utilization.","PeriodicalId":425891,"journal":{"name":"Proceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications","volume":"99 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-06-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3085158.3086160","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Heterogeneous computing platforms that use GPUs for acceleration are becoming prevalent. Developing parallel applications for GPU platforms and optimizing GPU related applications for good performance is important. In this work, we develop a set of applications based on a high level task design, which ensures a well defined structure for portability improvement. Together with the GPU task implementation, we utilize a uniform interface to allocate and manage memory blocks that are used by both host and device. In this way we can choose the appropriate types of memory for host/device communication easily and flexibly in GPU tasks. Through asynchronous task execution and CUDA streams, we can explore concurrent GPU kernels for performance improvement when running multiple tasks. We developed a test benchmark set containing nine different kernel applications. Through tests we can learn that pinned memory can improve host/device data transfer for GPU platforms. The performance of unified memory differs a lot on different GPU architectures and is not a good choice if performance is the main focus. The multiple task tests show that applications based on our GPU tasks can effectively make use of the concurrent kernel ability of modern GPUs for better resource utilization.

查看原文本刊更多论文

使用高级GPU任务探索异构平台上的内存和通信选项

使用gpu加速的异构计算平台正变得越来越普遍。为GPU平台开发并行应用程序并优化GPU相关应用程序以获得良好的性能非常重要。在这项工作中，我们开发了一组基于高级任务设计的应用程序，这确保了一个定义良好的结构，以提高可移植性。与GPU任务实现一起，我们利用统一的接口来分配和管理主机和设备使用的内存块。通过这种方式，我们可以在GPU任务中轻松灵活地为主机/设备通信选择适当类型的内存。通过异步任务执行和CUDA流，我们可以探索并发GPU内核在运行多任务时的性能改进。我们开发了一个包含九个不同内核应用程序的测试基准集。通过测试，我们可以了解到固定内存可以改善GPU平台的主机/设备数据传输。统一内存的性能在不同的GPU架构上差异很大，如果性能是主要关注点，那么统一内存不是一个好的选择。多任务测试表明，基于我们的GPU任务的应用程序可以有效地利用现代GPU的并发内核能力，从而更好地利用资源。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2017 Workshop on Software Engineering Methods for Parallel and High Performance Applications

自引率

0.00%

发文量