GPU直接I/O与HDF5

2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW) Pub Date : 2020-11-01 DOI:10.1109/PDSW51947.2020.00010

J. Ravi, S. Byna, Q. Koziol

{"title":"GPU直接I/O与HDF5","authors":"J. Ravi, S. Byna, Q. Koziol","doi":"10.1109/PDSW51947.2020.00010","DOIUrl":null,"url":null,"abstract":"Exascale HPC systems are being designed with accelerators, such as GPUs, to accelerate parts of applications. In machine learning workloads as well as large-scale simulations that use GPUs as accelerators, the CPU (or host) memory is currently used as a buffer for data transfers between GPU (or device) memory and the file system. If the CPU does not need to operate on the data, then this is sub-optimal because it wastes host memory by reserving space for duplicated data. Furthermore, this “bounce buffer” approach wastes CPU cycles spent on transferring data. A new technique, NVIDIA GPUDirect Storage (GDS), can eliminate the need to use the host memory as a bounce buffer. Thereby, it becomes possible to transfer data directly between the device memory and the file system. This direct data path shortens latency by omitting the extra copy and enables higher-bandwidth. To take full advantage of GDS in existing applications, it is necessary to provide support with existing I/O libraries, such as HDF5 and MPI-IO, which are heavily used in applications. In this paper, we describe our effort of integrating GDS with HDF5, the top I/O library at NERSC and at DOE leadership computing facilities. We design and implement this integration using a HDF5 Virtual File Driver (VFD). The GDS VFD provides a file system abstraction to the application that allows HDF5 applications to perform I/O without the need to move data between CPUs and GPUs explicitly. We compare performance of the HDF5 GDS VFD with explicit data movement approaches and demonstrate superior performance with the GDS method.","PeriodicalId":142923,"journal":{"name":"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)","volume":"16 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"GPU Direct I/O with HDF5\",\"authors\":\"J. Ravi, S. Byna, Q. Koziol\",\"doi\":\"10.1109/PDSW51947.2020.00010\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Exascale HPC systems are being designed with accelerators, such as GPUs, to accelerate parts of applications. In machine learning workloads as well as large-scale simulations that use GPUs as accelerators, the CPU (or host) memory is currently used as a buffer for data transfers between GPU (or device) memory and the file system. If the CPU does not need to operate on the data, then this is sub-optimal because it wastes host memory by reserving space for duplicated data. Furthermore, this “bounce buffer” approach wastes CPU cycles spent on transferring data. A new technique, NVIDIA GPUDirect Storage (GDS), can eliminate the need to use the host memory as a bounce buffer. Thereby, it becomes possible to transfer data directly between the device memory and the file system. This direct data path shortens latency by omitting the extra copy and enables higher-bandwidth. To take full advantage of GDS in existing applications, it is necessary to provide support with existing I/O libraries, such as HDF5 and MPI-IO, which are heavily used in applications. In this paper, we describe our effort of integrating GDS with HDF5, the top I/O library at NERSC and at DOE leadership computing facilities. We design and implement this integration using a HDF5 Virtual File Driver (VFD). The GDS VFD provides a file system abstraction to the application that allows HDF5 applications to perform I/O without the need to move data between CPUs and GPUs explicitly. We compare performance of the HDF5 GDS VFD with explicit data movement approaches and demonstrate superior performance with the GDS method.\",\"PeriodicalId\":142923,\"journal\":{\"name\":\"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)\",\"volume\":\"16 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PDSW51947.2020.00010\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PDSW51947.2020.00010","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

Exascale HPC系统被设计为带有加速器，例如gpu，以加速部分应用程序。在机器学习工作负载以及使用GPU作为加速器的大规模模拟中，CPU(或主机)内存目前用作GPU(或设备)内存和文件系统之间数据传输的缓冲区。如果CPU不需要对数据进行操作，那么这是次优的，因为它会为重复的数据保留空间，从而浪费主机内存。此外，这种“反弹缓冲”方法浪费了用于传输数据的CPU周期。一项新技术，NVIDIA GPUDirect Storage (GDS)，可以消除使用主机内存作为弹跳缓冲器的需要。因此，可以在设备存储器和文件系统之间直接传输数据。这种直接的数据路径通过省略额外的拷贝来缩短延迟，并支持更高的带宽。为了在现有的应用程序中充分利用GDS，有必要为现有的I/O库提供支持，例如HDF5和MPI-IO，它们在应用程序中被大量使用。在本文中，我们描述了我们将GDS与HDF5 (NERSC和DOE领导计算设施的顶级I/O库)集成的努力。我们使用HDF5虚拟文件驱动程序(VFD)设计和实现这种集成。GDS VFD为应用程序提供了一个文件系统抽象，允许HDF5应用程序执行I/O，而不需要在cpu和gpu之间显式地移动数据。我们比较了HDF5 GDS VFD与显式数据移动方法的性能，并证明了GDS方法的优越性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GPU Direct I/O with HDF5

Exascale HPC systems are being designed with accelerators, such as GPUs, to accelerate parts of applications. In machine learning workloads as well as large-scale simulations that use GPUs as accelerators, the CPU (or host) memory is currently used as a buffer for data transfers between GPU (or device) memory and the file system. If the CPU does not need to operate on the data, then this is sub-optimal because it wastes host memory by reserving space for duplicated data. Furthermore, this “bounce buffer” approach wastes CPU cycles spent on transferring data. A new technique, NVIDIA GPUDirect Storage (GDS), can eliminate the need to use the host memory as a bounce buffer. Thereby, it becomes possible to transfer data directly between the device memory and the file system. This direct data path shortens latency by omitting the extra copy and enables higher-bandwidth. To take full advantage of GDS in existing applications, it is necessary to provide support with existing I/O libraries, such as HDF5 and MPI-IO, which are heavily used in applications. In this paper, we describe our effort of integrating GDS with HDF5, the top I/O library at NERSC and at DOE leadership computing facilities. We design and implement this integration using a HDF5 Virtual File Driver (VFD). The GDS VFD provides a file system abstraction to the application that allows HDF5 applications to perform I/O without the need to move data between CPUs and GPUs explicitly. We compare performance of the HDF5 GDS VFD with explicit data movement approaches and demonstrate superior performance with the GDS method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 IEEE/ACM Fifth International Parallel Data Systems Workshop (PDSW)

自引率

0.00%

发文量