优化云中的按需gpu,用于深度学习应用程序训练

A. Jahani, M. Lattuada, M. Ciavotta, D. Ardagna, E. Amaldi, Li Zhang
{"title":"优化云中的按需gpu,用于深度学习应用程序训练","authors":"A. Jahani, M. Lattuada, M. Ciavotta, D. Ardagna, E. Amaldi, Li Zhang","doi":"10.1109/CCCS.2019.8888151","DOIUrl":null,"url":null,"abstract":"Deep learning (DL) methods have recently gained popularity and been used in commonplace applications; voice and face recognition, among the others. Despite the growing popularity of DL and the associated hardware acceleration techniques, GPU-based systems still have very high costs. Moreover, while the cloud represents a cost-effective and flexible solution, in large settings operations costs can be further optimized by carefully managing and fostering resource sharing. This work addresses the online joint problem of capacity planning of virtual machines (VMs) and DL training jobs scheduling, and proposes a Mixed Integer Linear Programming (MILP) formulation. In particular, DL jobs are assumed to feature a deadline, while multiple VM types are available from a cloud provider catalog, and each VM has, possibly, multiple GPUs. Our solutions optimize the operations costs by (i) right-sizing the VM capacities; (ii) partitioning the set of GPUs among multiple concurrent jobs running on the same VM, and (iii) determining a deadline-aware job schedule. Our approach is evaluated using an ad-hoc simulator and a prototype environment, and compared against first-principle approaches, resulting in a cost reduction of 45-80%.","PeriodicalId":152148,"journal":{"name":"2019 4th International Conference on Computing, Communications and Security (ICCCS)","volume":"33 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Optimizing on-demand GPUs in the Cloud for Deep Learning Applications Training\",\"authors\":\"A. Jahani, M. Lattuada, M. Ciavotta, D. Ardagna, E. Amaldi, Li Zhang\",\"doi\":\"10.1109/CCCS.2019.8888151\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep learning (DL) methods have recently gained popularity and been used in commonplace applications; voice and face recognition, among the others. Despite the growing popularity of DL and the associated hardware acceleration techniques, GPU-based systems still have very high costs. Moreover, while the cloud represents a cost-effective and flexible solution, in large settings operations costs can be further optimized by carefully managing and fostering resource sharing. This work addresses the online joint problem of capacity planning of virtual machines (VMs) and DL training jobs scheduling, and proposes a Mixed Integer Linear Programming (MILP) formulation. In particular, DL jobs are assumed to feature a deadline, while multiple VM types are available from a cloud provider catalog, and each VM has, possibly, multiple GPUs. Our solutions optimize the operations costs by (i) right-sizing the VM capacities; (ii) partitioning the set of GPUs among multiple concurrent jobs running on the same VM, and (iii) determining a deadline-aware job schedule. Our approach is evaluated using an ad-hoc simulator and a prototype environment, and compared against first-principle approaches, resulting in a cost reduction of 45-80%.\",\"PeriodicalId\":152148,\"journal\":{\"name\":\"2019 4th International Conference on Computing, Communications and Security (ICCCS)\",\"volume\":\"33 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 4th International Conference on Computing, Communications and Security (ICCCS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCCS.2019.8888151\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 4th International Conference on Computing, Communications and Security (ICCCS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCCS.2019.8888151","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

深度学习(DL)方法最近得到了普及,并在常见的应用中得到了应用;语音和面部识别等等。尽管DL和相关的硬件加速技术越来越受欢迎,但基于gpu的系统仍然具有非常高的成本。此外,虽然云代表了一种经济高效且灵活的解决方案,但在大型环境中,可以通过仔细管理和促进资源共享来进一步优化运营成本。本文解决了虚拟机(vm)容量规划和深度学习训练作业调度的在线联合问题,并提出了一个混合整数线性规划(MILP)公式。特别是,假定DL作业具有截止日期,而从云提供商目录中可以获得多种VM类型,并且每个VM可能有多个gpu。我们的解决方案通过以下方式优化运营成本:(1)正确调整虚拟机容量;(ii)在同一VM上运行的多个并发作业之间对gpu集进行分区,以及(iii)确定一个截止日期感知的作业计划。我们的方法使用特设模拟器和原型环境进行了评估,并与第一原理方法进行了比较,结果成本降低了45-80%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Optimizing on-demand GPUs in the Cloud for Deep Learning Applications Training
Deep learning (DL) methods have recently gained popularity and been used in commonplace applications; voice and face recognition, among the others. Despite the growing popularity of DL and the associated hardware acceleration techniques, GPU-based systems still have very high costs. Moreover, while the cloud represents a cost-effective and flexible solution, in large settings operations costs can be further optimized by carefully managing and fostering resource sharing. This work addresses the online joint problem of capacity planning of virtual machines (VMs) and DL training jobs scheduling, and proposes a Mixed Integer Linear Programming (MILP) formulation. In particular, DL jobs are assumed to feature a deadline, while multiple VM types are available from a cloud provider catalog, and each VM has, possibly, multiple GPUs. Our solutions optimize the operations costs by (i) right-sizing the VM capacities; (ii) partitioning the set of GPUs among multiple concurrent jobs running on the same VM, and (iii) determining a deadline-aware job schedule. Our approach is evaluated using an ad-hoc simulator and a prototype environment, and compared against first-principle approaches, resulting in a cost reduction of 45-80%.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信