探索用于深度学习应用的AWS EC2 GPU实例的成本效益

Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing Pub Date : 2019-12-02 DOI:10.1145/3344341.3368814

E. Malta, S. Avila, E. Borin

{"title":"探索用于深度学习应用的AWS EC2 GPU实例的成本效益","authors":"E. Malta, S. Avila, E. Borin","doi":"10.1145/3344341.3368814","DOIUrl":null,"url":null,"abstract":"Deep Learning is a subfield of machine learning methods based on artificial neural networks. Thanks to the increased data availability and computational power, such as Graphic Process Units (GPU), training deep networks - a time-consuming process - became possible. Cloud computing is an excellent option to acquire the computational power to train these models since it provides elastic products with a pay-per-use model. Amazon Web Services (AWS), for instance, has GPU-based virtual machine instances in its catalog, which differentiates themselves by the GPU type, number of GPUs, and price per hour. The challenge consists in determining which instance is better for a specific deep learning problem. This paper presents the implications, in terms of runtime and cost, of running two different deep learning problems on AWS GPU-based instances, and it proposes a methodology, based on the previous study cases, that analyzes instances for deep learning algorithms by using the information provided by the Keras framework. Our experimental results indicate that, despite having a higher price per hour, the instances that contain the NVIDIA V100 GPUs (p3) are faster and usually less expensive to use than the instances that contain the NVIDIA K80 GPUs (p2) for the problems we analyzed. Also, the results indicate that the performance of both applications did not scale well with the number of GPUs and that increasing the batch size to improve scalability may affect the final model accuracy. Finally, the proposed methodology provides accurate cost and estimated runtime for the tested applications on different AWS instances with a small cost.","PeriodicalId":261870,"journal":{"name":"Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing","volume":"18 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Exploring the Cost-benefit of AWS EC2 GPU Instances for Deep Learning Applications\",\"authors\":\"E. Malta, S. Avila, E. Borin\",\"doi\":\"10.1145/3344341.3368814\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Deep Learning is a subfield of machine learning methods based on artificial neural networks. Thanks to the increased data availability and computational power, such as Graphic Process Units (GPU), training deep networks - a time-consuming process - became possible. Cloud computing is an excellent option to acquire the computational power to train these models since it provides elastic products with a pay-per-use model. Amazon Web Services (AWS), for instance, has GPU-based virtual machine instances in its catalog, which differentiates themselves by the GPU type, number of GPUs, and price per hour. The challenge consists in determining which instance is better for a specific deep learning problem. This paper presents the implications, in terms of runtime and cost, of running two different deep learning problems on AWS GPU-based instances, and it proposes a methodology, based on the previous study cases, that analyzes instances for deep learning algorithms by using the information provided by the Keras framework. Our experimental results indicate that, despite having a higher price per hour, the instances that contain the NVIDIA V100 GPUs (p3) are faster and usually less expensive to use than the instances that contain the NVIDIA K80 GPUs (p2) for the problems we analyzed. Also, the results indicate that the performance of both applications did not scale well with the number of GPUs and that increasing the batch size to improve scalability may affect the final model accuracy. Finally, the proposed methodology provides accurate cost and estimated runtime for the tested applications on different AWS instances with a small cost.\",\"PeriodicalId\":261870,\"journal\":{\"name\":\"Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing\",\"volume\":\"18 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3344341.3368814\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3344341.3368814","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

深度学习是基于人工神经网络的机器学习方法的一个子领域。由于数据可用性和计算能力的提高，比如图形处理单元(GPU)，训练深度网络——一个耗时的过程——成为可能。云计算是获得训练这些模型的计算能力的绝佳选择，因为它提供了具有按使用付费模型的弹性产品。例如，Amazon Web Services (AWS)在其目录中有基于GPU的虚拟机实例，它们通过GPU类型、GPU数量和每小时价格来区分自己。挑战在于确定哪种实例更适合特定的深度学习问题。本文介绍了在基于AWS gpu的实例上运行两个不同的深度学习问题的运行时间和成本方面的含义，并根据之前的研究案例提出了一种方法，该方法通过使用Keras框架提供的信息来分析深度学习算法的实例。我们的实验结果表明，尽管每小时的价格更高，但对于我们分析的问题，包含NVIDIA V100 gpu (p3)的实例比包含NVIDIA K80 gpu (p2)的实例更快，使用成本通常更低。此外，结果表明，两个应用程序的性能都不能很好地随gpu数量的增加而扩展，并且增加批处理大小以提高可伸缩性可能会影响最终的模型准确性。最后，所提出的方法以较小的成本为不同AWS实例上的测试应用程序提供准确的成本和估计的运行时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Exploring the Cost-benefit of AWS EC2 GPU Instances for Deep Learning Applications

Deep Learning is a subfield of machine learning methods based on artificial neural networks. Thanks to the increased data availability and computational power, such as Graphic Process Units (GPU), training deep networks - a time-consuming process - became possible. Cloud computing is an excellent option to acquire the computational power to train these models since it provides elastic products with a pay-per-use model. Amazon Web Services (AWS), for instance, has GPU-based virtual machine instances in its catalog, which differentiates themselves by the GPU type, number of GPUs, and price per hour. The challenge consists in determining which instance is better for a specific deep learning problem. This paper presents the implications, in terms of runtime and cost, of running two different deep learning problems on AWS GPU-based instances, and it proposes a methodology, based on the previous study cases, that analyzes instances for deep learning algorithms by using the information provided by the Keras framework. Our experimental results indicate that, despite having a higher price per hour, the instances that contain the NVIDIA V100 GPUs (p3) are faster and usually less expensive to use than the instances that contain the NVIDIA K80 GPUs (p2) for the problems we analyzed. Also, the results indicate that the performance of both applications did not scale well with the number of GPUs and that increasing the batch size to improve scalability may affect the final model accuracy. Finally, the proposed methodology provides accurate cost and estimated runtime for the tested applications on different AWS instances with a small cost.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 12th IEEE/ACM International Conference on Utility and Cloud Computing

自引率

0.00%

发文量