Hierarchical Scheduling in on-demand GPU-as-a-Service Systems

2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC) Pub Date : 2020-08-28 DOI:10.1109/SYNASC51798.2020.00030

Federica Filippini, M. Lattuada, A. Jahani, M. Ciavotta, D. Ardagna, E. Amaldi

{"title":"Hierarchical Scheduling in on-demand GPU-as-a-Service Systems","authors":"Federica Filippini, M. Lattuada, A. Jahani, M. Ciavotta, D. Ardagna, E. Amaldi","doi":"10.1109/SYNASC51798.2020.00030","DOIUrl":null,"url":null,"abstract":"Deep learning (DL) methods have recently gained popularity. Training this class of models is, however, computing-intensive, and frequently GPUs are used to boost performance. Although the costs of GPU-based systems are gradually reducing due to the high demand, they are still prohibitive: in public clouds, GPU-powered virtual machines (VMs) time unit price is 5-8x higher than CPU-only VMs. While the cloud remains the most cost-effective and flexible deployment, operation costs can be reduced, in large settings, by rightsizing and sharing resources among multiple processes. This work addresses the online joint capacity planning and job scheduling with due dates problem and proposes alternative matheuristic solution methods. Our objective is to optimize operation costs by: i) rightsizing the VM capacities at each node, ii) partitioning the set of GPUs among multiple concurrent jobs on the same VM, and iii) determining a due-date-aware job schedule. The effectiveness of the proposed hierarchical approach, coupled with an appropriate Mixed Integer Linear Programming formulation, is validated against first-principle methods by relying on simulation. The experiments prove that the efficiency of GPU-based systems evaluated in terms of costs can be improved by 50-70%. Finally, scalability analyses show that the proposed approach enables to solve problem instances with up to 100 nodes in less than one minute on average, making it suitable for practical scenarios.","PeriodicalId":278104,"journal":{"name":"2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-08-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SYNASC51798.2020.00030","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Deep learning (DL) methods have recently gained popularity. Training this class of models is, however, computing-intensive, and frequently GPUs are used to boost performance. Although the costs of GPU-based systems are gradually reducing due to the high demand, they are still prohibitive: in public clouds, GPU-powered virtual machines (VMs) time unit price is 5-8x higher than CPU-only VMs. While the cloud remains the most cost-effective and flexible deployment, operation costs can be reduced, in large settings, by rightsizing and sharing resources among multiple processes. This work addresses the online joint capacity planning and job scheduling with due dates problem and proposes alternative matheuristic solution methods. Our objective is to optimize operation costs by: i) rightsizing the VM capacities at each node, ii) partitioning the set of GPUs among multiple concurrent jobs on the same VM, and iii) determining a due-date-aware job schedule. The effectiveness of the proposed hierarchical approach, coupled with an appropriate Mixed Integer Linear Programming formulation, is validated against first-principle methods by relying on simulation. The experiments prove that the efficiency of GPU-based systems evaluated in terms of costs can be improved by 50-70%. Finally, scalability analyses show that the proposed approach enables to solve problem instances with up to 100 nodes in less than one minute on average, making it suitable for practical scenarios.

查看原文本刊更多论文

按需gpu即服务系统中的分层调度

深度学习(DL)方法最近得到了普及。然而，训练这类模型是计算密集型的，并且经常使用gpu来提高性能。尽管基于gpu的系统的成本由于高需求而逐渐降低，但它们仍然令人望而却步:在公共云中，gpu驱动的虚拟机(vm)时间单位价格比仅cpu的虚拟机高5-8倍。虽然云仍然是最具成本效益和最灵活的部署，但在大型设置中，可以通过在多个进程之间调整大小和共享资源来降低操作成本。本文研究了带到期日的在线联合产能规划和作业调度问题，并提出了可选的数学求解方法。我们的目标是通过以下方式优化操作成本:i)在每个节点上正确调整VM容量，ii)在同一VM上的多个并发作业中划分gpu集，以及iii)确定到期日期感知的作业计划。通过仿真验证了所提出的分层方法的有效性，并结合适当的混合整数线性规划公式对第一性原理方法进行了验证。实验证明，以成本衡量，基于gpu的系统的效率可以提高50-70%。最后，可伸缩性分析表明，所提出的方法能够在平均不到一分钟的时间内解决多达100个节点的问题实例，使其适合于实际场景。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2020 22nd International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC)

自引率

0.00%

发文量