Compiler-assisted scheduling for multi-instance GPUs

C. Porter, Chao Chen, S. Pande
{"title":"Compiler-assisted scheduling for multi-instance GPUs","authors":"C. Porter, Chao Chen, S. Pande","doi":"10.1145/3530390.3532734","DOIUrl":null,"url":null,"abstract":"NVIDIA's Multi-Instance GPU (MIG) feature allows users to partition a GPU's compute and memory into independent hardware instances. MIG guarantees full isolation among co-executing kernels on the device, which boosts security and prevents performance interference-related degradation. Despite the benefits of isolation, however, certain workloads do not necessarily need such guarantees, and in fact enforcing such isolation can negatively impact the throughput of a group of processes. In this work we aim to relax the isolation property for certain types of jobs, and to show how this can dramatically boost throughput across a mixed workload consisting of jobs that demand isolation and others that do not. The number of MIG partitions is hardware-limited but configurable, and state-of-the-art workload managers cannot safely take advantage of unused and wasted resources inside a given partition. We show how a compiler and runtime system working in tandem can be used to pack jobs into partitions when isolation is not necessary. Using this technique we improve overall utilization of the device while still reaping the benefits of MIG's isolation properties. Our experimental results on NVIDIA A30s with a throughput-oriented workload show an average of 1.45x throughput improvement and 2.93x increase in GPU memory utilization over the Slurm workload manager. The presented framework is fully automatic and requires no changes to user code. Based on these results, we believe our scheme is a practical and strong advancement over state-of-the-art techniques currently employed for MIG.","PeriodicalId":442986,"journal":{"name":"Proceedings of the 14th Workshop on General Purpose Processing Using GPU","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 14th Workshop on General Purpose Processing Using GPU","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3530390.3532734","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

NVIDIA's Multi-Instance GPU (MIG) feature allows users to partition a GPU's compute and memory into independent hardware instances. MIG guarantees full isolation among co-executing kernels on the device, which boosts security and prevents performance interference-related degradation. Despite the benefits of isolation, however, certain workloads do not necessarily need such guarantees, and in fact enforcing such isolation can negatively impact the throughput of a group of processes. In this work we aim to relax the isolation property for certain types of jobs, and to show how this can dramatically boost throughput across a mixed workload consisting of jobs that demand isolation and others that do not. The number of MIG partitions is hardware-limited but configurable, and state-of-the-art workload managers cannot safely take advantage of unused and wasted resources inside a given partition. We show how a compiler and runtime system working in tandem can be used to pack jobs into partitions when isolation is not necessary. Using this technique we improve overall utilization of the device while still reaping the benefits of MIG's isolation properties. Our experimental results on NVIDIA A30s with a throughput-oriented workload show an average of 1.45x throughput improvement and 2.93x increase in GPU memory utilization over the Slurm workload manager. The presented framework is fully automatic and requires no changes to user code. Based on these results, we believe our scheme is a practical and strong advancement over state-of-the-art techniques currently employed for MIG.
多实例gpu的编译器辅助调度
NVIDIA的多实例GPU (MIG)功能允许用户将GPU的计算和内存划分为独立的硬件实例。MIG保证了设备上共同执行的内核之间的完全隔离,从而提高了安全性并防止了与性能干扰相关的降级。尽管隔离有好处,但是某些工作负载并不一定需要这样的保证,而且实际上强制这样的隔离可能会对一组进程的吞吐量产生负面影响。在这项工作中,我们的目标是放宽某些类型作业的隔离属性,并展示这如何显著提高混合工作负载(包括需要隔离的作业和不需要隔离的作业)的吞吐量。MIG分区的数量是受硬件限制的,但是是可配置的,最先进的工作负载管理器不能安全地利用给定分区内未使用和浪费的资源。我们将展示如何使用编译器和运行时系统在不需要隔离时将作业打包到分区中。使用这种技术,我们提高了设备的总体利用率,同时仍然获得了MIG隔离特性的好处。我们在NVIDIA a30上的实验结果显示,与Slurm工作负载管理器相比,吞吐量提高了1.45倍,GPU内存利用率提高了2.93倍。所提供的框架是全自动的,不需要更改用户代码。基于这些结果,我们相信我们的方案比米格目前使用的最先进的技术是一个实用和强大的进步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信