GPUdmm:使用动态内存管理的高性能、内存无关的GPU架构

2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA) Pub Date : 2014-06-19 DOI:10.1109/HPCA.2014.6835963

Youngsok Kim, Jaewon Lee, Jae-Eon Jo, Jangwoo Kim

{"title":"GPUdmm:使用动态内存管理的高性能、内存无关的GPU架构","authors":"Youngsok Kim, Jaewon Lee, Jae-Eon Jo, Jangwoo Kim","doi":"10.1109/HPCA.2014.6835963","DOIUrl":null,"url":null,"abstract":"GPU programmers suffer from programmer-managed GPU memory because both performance and programmability heavily depend on GPU memory allocation and CPU-GPU data transfer mechanisms. To improve performance and programmability, programmers should be able to place only the data frequently accessed by GPU on GPU memory while overlapping CPU-GPU data transfers and GPU executions as much as possible. However, current GPU architectures and programming models blindly place entire data on GPU memory, requiring a significantly large GPU memory size. Otherwise, they must trigger unnecessary CPU-GPU data transfers due to an insufficient GPU memory size. In this paper, we propose GPUdmm, a novel GPU architecture to enable high-performance and memory-oblivious GPU programming. First, GPUdmm uses GPU memory as a cache of CPU memory to provide programmers a view of the CPU memory-sized programming space. Second, GPUdmm achieves high performance by exploiting data locality and dynamically transferring data between CPU and GPU memories while effectively overlapping CPU-GPU data transfers and GPU executions. Third, GPUdmm can further reduce unnecessary CPU-GPU data transfers by exploiting simple programmer hints. Our carefully designed and validated experiments (e.g., PCIe/DMA timing) against representative benchmarks show that GPUdmm can achieve up to five times higher performance for the same GPU memory size, or reduce the GPU memory size requirement by up to 75% while maintaining the same performance.","PeriodicalId":164587,"journal":{"name":"2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management\",\"authors\":\"Youngsok Kim, Jaewon Lee, Jae-Eon Jo, Jangwoo Kim\",\"doi\":\"10.1109/HPCA.2014.6835963\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"GPU programmers suffer from programmer-managed GPU memory because both performance and programmability heavily depend on GPU memory allocation and CPU-GPU data transfer mechanisms. To improve performance and programmability, programmers should be able to place only the data frequently accessed by GPU on GPU memory while overlapping CPU-GPU data transfers and GPU executions as much as possible. However, current GPU architectures and programming models blindly place entire data on GPU memory, requiring a significantly large GPU memory size. Otherwise, they must trigger unnecessary CPU-GPU data transfers due to an insufficient GPU memory size. In this paper, we propose GPUdmm, a novel GPU architecture to enable high-performance and memory-oblivious GPU programming. First, GPUdmm uses GPU memory as a cache of CPU memory to provide programmers a view of the CPU memory-sized programming space. Second, GPUdmm achieves high performance by exploiting data locality and dynamically transferring data between CPU and GPU memories while effectively overlapping CPU-GPU data transfers and GPU executions. Third, GPUdmm can further reduce unnecessary CPU-GPU data transfers by exploiting simple programmer hints. Our carefully designed and validated experiments (e.g., PCIe/DMA timing) against representative benchmarks show that GPUdmm can achieve up to five times higher performance for the same GPU memory size, or reduce the GPU memory size requirement by up to 75% while maintaining the same performance.\",\"PeriodicalId\":164587,\"journal\":{\"name\":\"2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)\",\"volume\":\"5 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/HPCA.2014.6835963\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/HPCA.2014.6835963","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

摘要

GPU程序员要忍受程序员管理的GPU内存，因为性能和可编程性都严重依赖于GPU内存分配和CPU-GPU数据传输机制。为了提高性能和可编程性，程序员应该能够只将GPU经常访问的数据放在GPU内存上，同时尽可能多地重叠CPU-GPU数据传输和GPU执行。然而，当前的GPU架构和编程模型盲目地将整个数据放在GPU内存上，这需要很大的GPU内存。否则，由于GPU内存不足，会触发不必要的CPU-GPU数据传输。在本文中，我们提出了一种新的GPU架构GPUdmm，以实现高性能和内存无关的GPU编程。首先，GPUdmm使用GPU内存作为CPU内存的缓存，为程序员提供CPU内存大小的编程空间的视图。其次，GPUdmm通过利用数据局部性和在CPU和GPU存储器之间动态传输数据来实现高性能，同时有效地重叠CPU-GPU数据传输和GPU执行。第三，GPUdmm可以通过利用简单的程序员提示进一步减少不必要的CPU-GPU数据传输。我们经过精心设计和验证的实验(例如，PCIe/DMA计时)针对代表性基准测试表明，GPUdmm可以在相同GPU内存大小的情况下实现高达五倍的性能提升，或者在保持相同性能的同时将GPU内存大小要求降低高达75%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GPUdmm: A high-performance and memory-oblivious GPU architecture using dynamic memory management

GPU programmers suffer from programmer-managed GPU memory because both performance and programmability heavily depend on GPU memory allocation and CPU-GPU data transfer mechanisms. To improve performance and programmability, programmers should be able to place only the data frequently accessed by GPU on GPU memory while overlapping CPU-GPU data transfers and GPU executions as much as possible. However, current GPU architectures and programming models blindly place entire data on GPU memory, requiring a significantly large GPU memory size. Otherwise, they must trigger unnecessary CPU-GPU data transfers due to an insufficient GPU memory size. In this paper, we propose GPUdmm, a novel GPU architecture to enable high-performance and memory-oblivious GPU programming. First, GPUdmm uses GPU memory as a cache of CPU memory to provide programmers a view of the CPU memory-sized programming space. Second, GPUdmm achieves high performance by exploiting data locality and dynamically transferring data between CPU and GPU memories while effectively overlapping CPU-GPU data transfers and GPU executions. Third, GPUdmm can further reduce unnecessary CPU-GPU data transfers by exploiting simple programmer hints. Our carefully designed and validated experiments (e.g., PCIe/DMA timing) against representative benchmarks show that GPUdmm can achieve up to five times higher performance for the same GPU memory size, or reduce the GPU memory size requirement by up to 75% while maintaining the same performance.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA)

自引率

0.00%

发文量