Jaehoon Jung, Daeyoung Park, Gangwon Jo, Jungho Park, Jaejin Lee
{"title":"SnuRHAC","authors":"Jaehoon Jung, Daeyoung Park, Gangwon Jo, Jungho Park, Jaejin Lee","doi":"10.1145/3431379.3460647","DOIUrl":null,"url":null,"abstract":"This paper proposes a framework called SnuRHAC, which provides an illusion of a single GPU for the multiple GPUs in a cluster. Under SnuRHAC, a CUDA program designed to use a single GPU can utilize multiple GPUs in a cluster without any source code modification. SnuRHAC automatically distributes workload to multiple GPUs in a cluster and manages data across the nodes. To manage data efficiently, SnuRHAC extends CUDA Unified Memory and exploits its page fault mechanism. We also propose two prefetching techniques to fully exploit UM and to maximize performance. Static prefetching allows SnuRHAC to prefetch data by statically analyzing CUDA kernels. Dynamic prefetching complements static prefetching. SnuRHAC enforces an application to run on a single GPU if it is not suitable for multiple GPUs. We evaluate the performance of SnuRHAC using 18 benchmark applications from various sources. The evaluation result shows that while SnuRHAC significantly improves ease-of-programming, it shows scalable performance for the cluster environment depending on the application characteristics.","PeriodicalId":343991,"journal":{"name":"Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3431379.3460647","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
This paper proposes a framework called SnuRHAC, which provides an illusion of a single GPU for the multiple GPUs in a cluster. Under SnuRHAC, a CUDA program designed to use a single GPU can utilize multiple GPUs in a cluster without any source code modification. SnuRHAC automatically distributes workload to multiple GPUs in a cluster and manages data across the nodes. To manage data efficiently, SnuRHAC extends CUDA Unified Memory and exploits its page fault mechanism. We also propose two prefetching techniques to fully exploit UM and to maximize performance. Static prefetching allows SnuRHAC to prefetch data by statically analyzing CUDA kernels. Dynamic prefetching complements static prefetching. SnuRHAC enforces an application to run on a single GPU if it is not suitable for multiple GPUs. We evaluate the performance of SnuRHAC using 18 benchmark applications from various sources. The evaluation result shows that while SnuRHAC significantly improves ease-of-programming, it shows scalable performance for the cluster environment depending on the application characteristics.