Multi-tenancy on GPGPU-based servers

Virtualization Technologies in Distributed Computing Pub Date : 2013-06-18 DOI:10.1145/2465829.2465830

D. Sengupta, Raghavendra Belapure, K. Schwan

{"title":"Multi-tenancy on GPGPU-based servers","authors":"D. Sengupta, Raghavendra Belapure, K. Schwan","doi":"10.1145/2465829.2465830","DOIUrl":null,"url":null,"abstract":"While GPUs have become prominent both in high performance computing and in online or cloud services, they still appear as explicitly selected 'devices' rather than as first class schedulable entities that can be efficiently shared by diverse server applications. To combat the consequent likely under-utilization of GPUs when used in modern server or cloud settings, we propose 'Rain', a system level abstraction for GPU \"hyperthreading\" that makes it possible to efficiently utilize GPUs without compromising fairness among multiple tenant applications. Rain uses a multi-level GPU scheduler that decomposes the scheduling problem into a combination of load balancing and per-device scheduling. Implemented by overriding applications' standard GPU selection calls, Rain operates without the need for application modification, making possible GPU scheduling methods that include prioritizing certain jobs, guaranteeing fair shares of GPU resources, and/or favoring jobs with least attained GPU services. GPU multi-tenancy via Rain is evaluated with server workloads using a wide variety of CUDA SDK and Rodinia suite benchmarks, on a multi-GPU, multi-core machine typifying future high end server machines. Averaged over ten applications, GPU multi-tenancy on a smaller scale server platform results in application speedups of up to 1.73x compared to their traditional implementation with NVIDIA's CUDA runtime. Averaged over 25 pairs of short and long running applications, on an emulated larger scale server machine, multi-tenancy results in system throughput improvements of up to 6.71x, and in 43% and 29.3% improvements in fairness compared to using the CUDA runtime and a naïve fair-share scheduler.","PeriodicalId":176127,"journal":{"name":"Virtualization Technologies in Distributed Computing","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Virtualization Technologies in Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2465829.2465830","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 30

Abstract

While GPUs have become prominent both in high performance computing and in online or cloud services, they still appear as explicitly selected 'devices' rather than as first class schedulable entities that can be efficiently shared by diverse server applications. To combat the consequent likely under-utilization of GPUs when used in modern server or cloud settings, we propose 'Rain', a system level abstraction for GPU "hyperthreading" that makes it possible to efficiently utilize GPUs without compromising fairness among multiple tenant applications. Rain uses a multi-level GPU scheduler that decomposes the scheduling problem into a combination of load balancing and per-device scheduling. Implemented by overriding applications' standard GPU selection calls, Rain operates without the need for application modification, making possible GPU scheduling methods that include prioritizing certain jobs, guaranteeing fair shares of GPU resources, and/or favoring jobs with least attained GPU services. GPU multi-tenancy via Rain is evaluated with server workloads using a wide variety of CUDA SDK and Rodinia suite benchmarks, on a multi-GPU, multi-core machine typifying future high end server machines. Averaged over ten applications, GPU multi-tenancy on a smaller scale server platform results in application speedups of up to 1.73x compared to their traditional implementation with NVIDIA's CUDA runtime. Averaged over 25 pairs of short and long running applications, on an emulated larger scale server machine, multi-tenancy results in system throughput improvements of up to 6.71x, and in 43% and 29.3% improvements in fairness compared to using the CUDA runtime and a naïve fair-share scheduler.

查看原文本刊更多论文

基于gpgpu的服务器上的多租户

虽然gpu在高性能计算和在线或云服务中都很突出，但它们仍然是明确选择的“设备”，而不是可以被各种服务器应用程序有效共享的一流可调度实体。为了应对在现代服务器或云设置中使用GPU时可能导致的GPU利用率不足的问题，我们提出了“Rain”，这是GPU“超线程”的系统级抽象，可以有效地利用GPU，而不会影响多个租户应用程序之间的公平性。Rain使用一个多级GPU调度程序，将调度问题分解为负载平衡和每个设备调度的组合。通过覆盖应用程序的标准GPU选择调用来实现，Rain无需修改应用程序即可运行，使GPU调度方法成为可能，包括优先考虑某些作业，保证GPU资源的公平共享，以及/或支持GPU服务最少的作业。通过Rain的GPU多租户使用各种CUDA SDK和Rodinia套件基准测试，在多GPU，多核机器上评估服务器工作负载，代表未来的高端服务器机器。平均超过10个应用程序，在较小规模的服务器平台上的GPU多租户与使用NVIDIA CUDA运行时的传统实现相比，应用程序的速度提高了1.73倍。在模拟的大型服务器机器上，平均超过25对短期和长期运行的应用程序，与使用CUDA运行时和naïve公平共享调度器相比，多租户使系统吞吐量提高了6.71倍，公平性提高了43%和29.3%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Virtualization Technologies in Distributed Computing

自引率

0.00%

发文量