A virtual memory based runtime to support multi-tenancy in clusters with GPUs

IEEE International Symposium on High-Performance Parallel Distributed Computing Pub Date : 2012-06-18 DOI:10.1145/2287076.2287090

M. Becchi, Kittisak Sajjapongse, I. Graves, A. Procter, Vignesh T. Ravi, S. Chakradhar

{"title":"A virtual memory based runtime to support multi-tenancy in clusters with GPUs","authors":"M. Becchi, Kittisak Sajjapongse, I. Graves, A. Procter, Vignesh T. Ravi, S. Chakradhar","doi":"10.1145/2287076.2287090","DOIUrl":null,"url":null,"abstract":"Graphics Processing Units (GPUs) are increasingly becoming part of HPC clusters. Nevertheless, cloud computing services and resource management frameworks targeting heterogeneous clusters including GPUs are still in their infancy. Further, GPU software stacks (e.g., CUDA driver and runtime) currently provide very limited support to concurrency.\n In this paper, we propose a runtime system that provides abstraction and sharing of GPUs, while allowing isolation of concurrent applications. A central component of our runtime is a memory manager that provides a virtual memory abstraction to the applications. Our runtime is flexible in terms of scheduling policies, and allows dynamic (as opposed to programmer-defined) binding of applications to GPUs. In addition, our framework supports dynamic load balancing, dynamic upgrade and downgrade of GPUs, and is resilient to their failures. Our runtime can be deployed in combination with VM-based cloud computing services to allow virtualization of heterogeneous clusters, or in combination with HPC cluster resource managers to form an integrated resource management infrastructure for heterogeneous clusters. Experiments conducted on a three-node cluster show that our GPU sharing scheme allows up to a 28% and a 50% performance improvement over serialized execution on short- and long-running jobs, respectively. Further, dynamic inter-node load balancing leads to an additional 18-20% performance benefit.","PeriodicalId":330072,"journal":{"name":"IEEE International Symposium on High-Performance Parallel Distributed Computing","volume":"83 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"63","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE International Symposium on High-Performance Parallel Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2287076.2287090","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 63

Abstract

Graphics Processing Units (GPUs) are increasingly becoming part of HPC clusters. Nevertheless, cloud computing services and resource management frameworks targeting heterogeneous clusters including GPUs are still in their infancy. Further, GPU software stacks (e.g., CUDA driver and runtime) currently provide very limited support to concurrency. In this paper, we propose a runtime system that provides abstraction and sharing of GPUs, while allowing isolation of concurrent applications. A central component of our runtime is a memory manager that provides a virtual memory abstraction to the applications. Our runtime is flexible in terms of scheduling policies, and allows dynamic (as opposed to programmer-defined) binding of applications to GPUs. In addition, our framework supports dynamic load balancing, dynamic upgrade and downgrade of GPUs, and is resilient to their failures. Our runtime can be deployed in combination with VM-based cloud computing services to allow virtualization of heterogeneous clusters, or in combination with HPC cluster resource managers to form an integrated resource management infrastructure for heterogeneous clusters. Experiments conducted on a three-node cluster show that our GPU sharing scheme allows up to a 28% and a 50% performance improvement over serialized execution on short- and long-running jobs, respectively. Further, dynamic inter-node load balancing leads to an additional 18-20% performance benefit.

查看原文本刊更多论文

基于虚拟内存的运行时，支持gpu集群中的多租户

图形处理单元(gpu)正日益成为高性能计算集群的一部分。然而，针对异构集群(包括gpu)的云计算服务和资源管理框架仍处于起步阶段。此外，GPU软件栈(例如，CUDA驱动程序和运行时)目前对并发性的支持非常有限。在本文中，我们提出了一个运行时系统，提供gpu的抽象和共享，同时允许并发应用程序的隔离。运行时的一个核心组件是内存管理器，它为应用程序提供虚拟内存抽象。我们的运行时在调度策略方面是灵活的，并允许动态(与程序员定义的相反)将应用程序绑定到gpu。此外，我们的框架支持动态负载平衡，gpu的动态升级和降级，并对其故障具有弹性。我们的运行时可以与基于vm的云计算服务结合部署，以实现异构集群的虚拟化，或者与HPC集群资源管理器结合部署，形成异构集群的集成资源管理基础设施。在三节点集群上进行的实验表明，我们的GPU共享方案在短期和长时间运行的作业上分别比串行执行提高了28%和50%的性能。此外，动态节点间负载平衡可带来18-20%的额外性能优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE International Symposium on High-Performance Parallel Distributed Computing

自引率

0.00%

发文量