Implementing CUDA Unified Memory in the PyTorch Framework

Jake Choi, H. Yeom, Yoonhee Kim
{"title":"Implementing CUDA Unified Memory in the PyTorch Framework","authors":"Jake Choi, H. Yeom, Yoonhee Kim","doi":"10.1109/ACSOS-C52956.2021.00029","DOIUrl":null,"url":null,"abstract":"Popular deep learning frameworks like PyTorch utilize GPUs heavily for training, and suffer from out-of-memory (OOM) problems if memory is not managed properly. In this paper, we propose a modification that utilizes CUDA Unified Memory (UM) to expand GPU memory to the available host memory space so that practicality for the programmer can increase, and OOM memory errors will not result for any workload. We also pinpoint performance issues that result from our modifications to the framework, and outline future plans like reducing redundant memory copies, prefetching, and memory advising techniques to improve upon our design. Our implementation shows that PyTorch UM performance overheads are minimal when the data footprint is below GPU memory capacity.","PeriodicalId":268224,"journal":{"name":"2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACSOS-C52956.2021.00029","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Popular deep learning frameworks like PyTorch utilize GPUs heavily for training, and suffer from out-of-memory (OOM) problems if memory is not managed properly. In this paper, we propose a modification that utilizes CUDA Unified Memory (UM) to expand GPU memory to the available host memory space so that practicality for the programmer can increase, and OOM memory errors will not result for any workload. We also pinpoint performance issues that result from our modifications to the framework, and outline future plans like reducing redundant memory copies, prefetching, and memory advising techniques to improve upon our design. Our implementation shows that PyTorch UM performance overheads are minimal when the data footprint is below GPU memory capacity.
在PyTorch框架中实现CUDA统一内存
流行的深度学习框架(如PyTorch)大量使用gpu进行训练,如果内存管理不当,就会出现内存不足(OOM)问题。在本文中,我们提出了一种修改,利用CUDA统一内存(UM)将GPU内存扩展到可用的主机内存空间,以便程序员的实用性可以增加,并且OOM内存错误不会导致任何工作负载。我们还指出了由于修改框架而导致的性能问题,并概述了未来的计划,如减少冗余内存副本、预取和内存建议技术,以改进我们的设计。我们的实现表明,当数据占用低于GPU内存容量时,PyTorch UM的性能开销最小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信