{"title":"IVM: a task-based shared memory programming model and runtime system to enable uniform access to CPU-GPU clusters","authors":"Kittisak Sajjapongse, Ruidong Gu, M. Becchi","doi":"10.1145/2903150.2903174","DOIUrl":null,"url":null,"abstract":"GPUs have been widely used to accelerate a variety of applications from different domains and have become part of high-performance computing clusters. Yet, the use of GPUs within distributed applications still faces significant challenges in terms of programmability and performance portability. The use of popular programming models for distributed applications (such as MPI, SHMEM, and Charm++) in combination with GPU programming frameworks (such as CUDA and OpenCL) exposes to the programmer disjoint memory address spaces and provides a non-uniform view of compute resources (i.e., CPUs and GPUs). In addition, these programming models often perform static assignment of tasks to compute resources and require significant programming effort to embed dynamic scheduling and load balancing mechanisms within the application. In this work, we propose a programming framework called Inter-node Virtual Memory (IVM) that provides the programmer with a uniform view of compute resources and memory spaces within a CPU-GPU cluster, and a mechanism to easily incorporate load balancing within the application. We compare MPI, Charm++ and IVM on four distributed GPU applications. Our experimental results show that, while the main goal of IVM is programmer productivity, the use of the load balancing mechanisms offered by this framework can also lead to performance gains over existing frameworks.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2903150.2903174","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
GPUs have been widely used to accelerate a variety of applications from different domains and have become part of high-performance computing clusters. Yet, the use of GPUs within distributed applications still faces significant challenges in terms of programmability and performance portability. The use of popular programming models for distributed applications (such as MPI, SHMEM, and Charm++) in combination with GPU programming frameworks (such as CUDA and OpenCL) exposes to the programmer disjoint memory address spaces and provides a non-uniform view of compute resources (i.e., CPUs and GPUs). In addition, these programming models often perform static assignment of tasks to compute resources and require significant programming effort to embed dynamic scheduling and load balancing mechanisms within the application. In this work, we propose a programming framework called Inter-node Virtual Memory (IVM) that provides the programmer with a uniform view of compute resources and memory spaces within a CPU-GPU cluster, and a mechanism to easily incorporate load balancing within the application. We compare MPI, Charm++ and IVM on four distributed GPU applications. Our experimental results show that, while the main goal of IVM is programmer productivity, the use of the load balancing mechanisms offered by this framework can also lead to performance gains over existing frameworks.