IVM: a task-based shared memory programming model and runtime system to enable uniform access to CPU-GPU clusters

Proceedings of the ACM International Conference on Computing Frontiers Pub Date : 2016-05-16 DOI:10.1145/2903150.2903174

Kittisak Sajjapongse, Ruidong Gu, M. Becchi

{"title":"IVM: a task-based shared memory programming model and runtime system to enable uniform access to CPU-GPU clusters","authors":"Kittisak Sajjapongse, Ruidong Gu, M. Becchi","doi":"10.1145/2903150.2903174","DOIUrl":null,"url":null,"abstract":"GPUs have been widely used to accelerate a variety of applications from different domains and have become part of high-performance computing clusters. Yet, the use of GPUs within distributed applications still faces significant challenges in terms of programmability and performance portability. The use of popular programming models for distributed applications (such as MPI, SHMEM, and Charm++) in combination with GPU programming frameworks (such as CUDA and OpenCL) exposes to the programmer disjoint memory address spaces and provides a non-uniform view of compute resources (i.e., CPUs and GPUs). In addition, these programming models often perform static assignment of tasks to compute resources and require significant programming effort to embed dynamic scheduling and load balancing mechanisms within the application. In this work, we propose a programming framework called Inter-node Virtual Memory (IVM) that provides the programmer with a uniform view of compute resources and memory spaces within a CPU-GPU cluster, and a mechanism to easily incorporate load balancing within the application. We compare MPI, Charm++ and IVM on four distributed GPU applications. Our experimental results show that, while the main goal of IVM is programmer productivity, the use of the load balancing mechanisms offered by this framework can also lead to performance gains over existing frameworks.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2903150.2903174","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

GPUs have been widely used to accelerate a variety of applications from different domains and have become part of high-performance computing clusters. Yet, the use of GPUs within distributed applications still faces significant challenges in terms of programmability and performance portability. The use of popular programming models for distributed applications (such as MPI, SHMEM, and Charm++) in combination with GPU programming frameworks (such as CUDA and OpenCL) exposes to the programmer disjoint memory address spaces and provides a non-uniform view of compute resources (i.e., CPUs and GPUs). In addition, these programming models often perform static assignment of tasks to compute resources and require significant programming effort to embed dynamic scheduling and load balancing mechanisms within the application. In this work, we propose a programming framework called Inter-node Virtual Memory (IVM) that provides the programmer with a uniform view of compute resources and memory spaces within a CPU-GPU cluster, and a mechanism to easily incorporate load balancing within the application. We compare MPI, Charm++ and IVM on four distributed GPU applications. Our experimental results show that, while the main goal of IVM is programmer productivity, the use of the load balancing mechanisms offered by this framework can also lead to performance gains over existing frameworks.

查看原文本刊更多论文

IVM:基于任务的共享内存编程模型和运行时系统，支持对CPU-GPU集群的统一访问

gpu已被广泛应用于各种不同领域的应用，并已成为高性能计算集群的一部分。然而，在分布式应用程序中使用gpu仍然面临着可编程性和性能可移植性方面的重大挑战。分布式应用程序(如MPI, SHMEM和Charm++)与GPU编程框架(如CUDA和OpenCL)结合使用的流行编程模型向程序员暴露了不一致的内存地址空间，并提供了计算资源(即cpu和GPU)的非统一视图。此外，这些编程模型通常执行任务的静态分配以计算资源，并且需要大量编程工作来在应用程序中嵌入动态调度和负载平衡机制。在这项工作中，我们提出了一个称为节点间虚拟内存(IVM)的编程框架，它为程序员提供了CPU-GPU集群内计算资源和内存空间的统一视图，以及一种在应用程序中轻松合并负载平衡的机制。我们比较了MPI、Charm++和IVM在四种分布式GPU应用上的应用。我们的实验结果表明，虽然IVM的主要目标是提高程序员的工作效率，但使用该框架提供的负载平衡机制也可以提高现有框架的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the ACM International Conference on Computing Frontiers

自引率

0.00%

发文量