Compiler-assisted scheduling for multi-instance GPUs

Proceedings of the 14th Workshop on General Purpose Processing Using GPU Pub Date : 2022-04-03 DOI:10.1145/3530390.3532734

C. Porter, Chao Chen, S. Pande

{"title":"Compiler-assisted scheduling for multi-instance GPUs","authors":"C. Porter, Chao Chen, S. Pande","doi":"10.1145/3530390.3532734","DOIUrl":null,"url":null,"abstract":"NVIDIA's Multi-Instance GPU (MIG) feature allows users to partition a GPU's compute and memory into independent hardware instances. MIG guarantees full isolation among co-executing kernels on the device, which boosts security and prevents performance interference-related degradation. Despite the benefits of isolation, however, certain workloads do not necessarily need such guarantees, and in fact enforcing such isolation can negatively impact the throughput of a group of processes. In this work we aim to relax the isolation property for certain types of jobs, and to show how this can dramatically boost throughput across a mixed workload consisting of jobs that demand isolation and others that do not. The number of MIG partitions is hardware-limited but configurable, and state-of-the-art workload managers cannot safely take advantage of unused and wasted resources inside a given partition. We show how a compiler and runtime system working in tandem can be used to pack jobs into partitions when isolation is not necessary. Using this technique we improve overall utilization of the device while still reaping the benefits of MIG's isolation properties. Our experimental results on NVIDIA A30s with a throughput-oriented workload show an average of 1.45x throughput improvement and 2.93x increase in GPU memory utilization over the Slurm workload manager. The presented framework is fully automatic and requires no changes to user code. Based on these results, we believe our scheme is a practical and strong advancement over state-of-the-art techniques currently employed for MIG.","PeriodicalId":442986,"journal":{"name":"Proceedings of the 14th Workshop on General Purpose Processing Using GPU","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th Workshop on General Purpose Processing Using GPU","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3530390.3532734","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

NVIDIA's Multi-Instance GPU (MIG) feature allows users to partition a GPU's compute and memory into independent hardware instances. MIG guarantees full isolation among co-executing kernels on the device, which boosts security and prevents performance interference-related degradation. Despite the benefits of isolation, however, certain workloads do not necessarily need such guarantees, and in fact enforcing such isolation can negatively impact the throughput of a group of processes. In this work we aim to relax the isolation property for certain types of jobs, and to show how this can dramatically boost throughput across a mixed workload consisting of jobs that demand isolation and others that do not. The number of MIG partitions is hardware-limited but configurable, and state-of-the-art workload managers cannot safely take advantage of unused and wasted resources inside a given partition. We show how a compiler and runtime system working in tandem can be used to pack jobs into partitions when isolation is not necessary. Using this technique we improve overall utilization of the device while still reaping the benefits of MIG's isolation properties. Our experimental results on NVIDIA A30s with a throughput-oriented workload show an average of 1.45x throughput improvement and 2.93x increase in GPU memory utilization over the Slurm workload manager. The presented framework is fully automatic and requires no changes to user code. Based on these results, we believe our scheme is a practical and strong advancement over state-of-the-art techniques currently employed for MIG.

查看原文本刊更多论文

多实例gpu的编译器辅助调度

NVIDIA的多实例GPU (MIG)功能允许用户将GPU的计算和内存划分为独立的硬件实例。MIG保证了设备上共同执行的内核之间的完全隔离，从而提高了安全性并防止了与性能干扰相关的降级。尽管隔离有好处，但是某些工作负载并不一定需要这样的保证，而且实际上强制这样的隔离可能会对一组进程的吞吐量产生负面影响。在这项工作中，我们的目标是放宽某些类型作业的隔离属性，并展示这如何显著提高混合工作负载(包括需要隔离的作业和不需要隔离的作业)的吞吐量。MIG分区的数量是受硬件限制的，但是是可配置的，最先进的工作负载管理器不能安全地利用给定分区内未使用和浪费的资源。我们将展示如何使用编译器和运行时系统在不需要隔离时将作业打包到分区中。使用这种技术，我们提高了设备的总体利用率，同时仍然获得了MIG隔离特性的好处。我们在NVIDIA a30上的实验结果显示，与Slurm工作负载管理器相比，吞吐量提高了1.45倍，GPU内存利用率提高了2.93倍。所提供的框架是全自动的，不需要更改用户代码。基于这些结果，我们相信我们的方案比米格目前使用的最先进的技术是一个实用和强大的进步。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 14th Workshop on General Purpose Processing Using GPU

自引率

0.00%

发文量