Efficient Multi-GPU Memory Management for Deep Learning Acceleration

2018 IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS*W) Pub Date : 2018-09-01 DOI:10.1109/FAS-W.2018.00023

Youngrang Kim, J. Lee, Jik-Soo Kim, Hyunseung Jei, Hongchan Roh

引用次数: 10

Abstract

In this paper, we propose a new optimized memory management scheme that can improve the overall GPU memory utilization in multi-GPU systems for deep learning application acceleration. We extend the Nvidia's vDNN concept (a hybrid utilization of GPU and CPU memories) in a multi-GPU environment by effectively addressing PCIe-bus contention problems. In addition, we designed and implemented an intelligent prefetching algorithm (from CPU memory to GPU) that can achieve the highest processing throughput while sustaining a large min-batch size. For evaluation, we have implemented our memory usage optimization scheme on Tensorflow, the well-known machine learning library from Google, and performed extensive experiments in a multi-GPU testbed. Our evaluation results show that the proposed scheme can increase the mini-batch size by up to 60%, and improve the training throughput by up to 46.6% in a multi-GPU system.

查看原文本刊更多论文

在本文中，我们提出了一种新的优化内存管理方案，该方案可以提高多GPU系统中深度学习应用程序加速的GPU内存利用率。我们通过有效地解决pcie总线争用问题，在多GPU环境中扩展了Nvidia的vDNN概念(GPU和CPU内存的混合利用)。此外，我们设计并实现了一种智能预取算法(从CPU内存到GPU)，可以实现最高的处理吞吐量，同时保持较大的最小批大小。为了进行评估，我们在谷歌著名的机器学习库Tensorflow上实现了我们的内存使用优化方案，并在多gpu测试平台上进行了广泛的实验。我们的评估结果表明，在多gpu系统中，提出的方案可以将小批大小增加多达60%，并将训练吞吐量提高多达46.6%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2018 IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS*W)

自引率

0.00%

发文量