Exploration of GPU sharing policies under GEMM workloads

Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems Pub Date : 2020-05-25 DOI:10.1145/3378678.3391887

Ioannis Oroutzoglou, Dimosthenis Masouros, Konstantina Koliogeorgi, S. Xydis, D. Soudris

{"title":"Exploration of GPU sharing policies under GEMM workloads","authors":"Ioannis Oroutzoglou, Dimosthenis Masouros, Konstantina Koliogeorgi, S. Xydis, D. Soudris","doi":"10.1145/3378678.3391887","DOIUrl":null,"url":null,"abstract":"Lately, cloud computing has seen explosive growth, due to the flexibility and scalability it offers. The ever-increasing computational demands, especially from the machine learning domain, have forced cloud operators to enhance their infrastructure with acceleration devices, such as General-Purpose (GP)GPUs or FPGAs. Even though multi-tenancy has been widely examined for conventional CPUs, this is not the case for accelerators. Current solutions support \"one accelerator per user\" schemes, which can lead to both under-utilization and starvation of available resources. In this work, we analyze the potentials of GPU sharing inside data-center environments. We investigate how several architectural features affect the performance of GPUs under different multi-tenant stressing scenarios. We compare CUDA MPS with the native, default CUDA scheduler and also with Vinetalk, a research framework providing GPU sharing capabilities. Experimental results show that NVIDIA's MPS achieves the best performance in multi-application scenarios, specifically up to X4.5 and X11.2 compared to native CUDA scheduler and Vinetalk respectively.","PeriodicalId":383191,"journal":{"name":"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems","volume":"604 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3378678.3391887","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Lately, cloud computing has seen explosive growth, due to the flexibility and scalability it offers. The ever-increasing computational demands, especially from the machine learning domain, have forced cloud operators to enhance their infrastructure with acceleration devices, such as General-Purpose (GP)GPUs or FPGAs. Even though multi-tenancy has been widely examined for conventional CPUs, this is not the case for accelerators. Current solutions support "one accelerator per user" schemes, which can lead to both under-utilization and starvation of available resources. In this work, we analyze the potentials of GPU sharing inside data-center environments. We investigate how several architectural features affect the performance of GPUs under different multi-tenant stressing scenarios. We compare CUDA MPS with the native, default CUDA scheduler and also with Vinetalk, a research framework providing GPU sharing capabilities. Experimental results show that NVIDIA's MPS achieves the best performance in multi-application scenarios, specifically up to X4.5 and X11.2 compared to native CUDA scheduler and Vinetalk respectively.

查看原文本刊更多论文

GEMM工作负载下GPU共享策略的探索

最近，由于云计算提供的灵活性和可伸缩性，它出现了爆炸式的增长。不断增长的计算需求，特别是来自机器学习领域的需求，迫使云运营商使用加速设备(如通用(GP) gpu或fpga)来增强其基础设施。尽管对传统cpu的多租户已经进行了广泛的研究，但加速器的情况并非如此。当前的解决方案支持“每个用户一个加速器”方案，这可能导致可用资源利用率不足和缺乏。在这项工作中，我们分析了在数据中心环境中GPU共享的潜力。我们研究了几种架构特性在不同的多租户压力场景下如何影响gpu的性能。我们将CUDA MPS与本地默认CUDA调度器以及提供GPU共享功能的研究框架Vinetalk进行了比较。实验结果表明，与原生CUDA调度器和Vinetalk相比，NVIDIA的MPS在多应用场景下实现了最佳性能，特别是高达X4.5和X11.2。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 23th International Workshop on Software and Compilers for Embedded Systems

自引率

0.00%

发文量