Implementing CUDA Unified Memory in the PyTorch Framework

2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C) Pub Date : 2021-09-01 DOI:10.1109/ACSOS-C52956.2021.00029

Jake Choi, H. Yeom, Yoonhee Kim

引用次数: 1

Abstract

Popular deep learning frameworks like PyTorch utilize GPUs heavily for training, and suffer from out-of-memory (OOM) problems if memory is not managed properly. In this paper, we propose a modification that utilizes CUDA Unified Memory (UM) to expand GPU memory to the available host memory space so that practicality for the programmer can increase, and OOM memory errors will not result for any workload. We also pinpoint performance issues that result from our modifications to the framework, and outline future plans like reducing redundant memory copies, prefetching, and memory advising techniques to improve upon our design. Our implementation shows that PyTorch UM performance overheads are minimal when the data footprint is below GPU memory capacity.

查看原文本刊更多论文

在PyTorch框架中实现CUDA统一内存

流行的深度学习框架(如PyTorch)大量使用gpu进行训练，如果内存管理不当，就会出现内存不足(OOM)问题。在本文中，我们提出了一种修改，利用CUDA统一内存(UM)将GPU内存扩展到可用的主机内存空间，以便程序员的实用性可以增加，并且OOM内存错误不会导致任何工作负载。我们还指出了由于修改框架而导致的性能问题，并概述了未来的计划，如减少冗余内存副本、预取和内存建议技术，以改进我们的设计。我们的实现表明，当数据占用低于GPU内存容量时，PyTorch UM的性能开销最小。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)

自引率

0.00%

发文量