在HIP中评估统一记忆体的效能

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) Pub Date : 2022-05-01 DOI:10.1109/IPDPSW55747.2022.00096

Zheming Jin, J. Vetter

{"title":"在HIP中评估统一记忆体的效能","authors":"Zheming Jin, J. Vetter","doi":"10.1109/IPDPSW55747.2022.00096","DOIUrl":null,"url":null,"abstract":"Heterogeneous unified memory management between a CPU and a GPU is a major challenge in GPU computing. Recently, unified memory (UM) has been supported by software and hardware components on AMD computing platforms. The support could simplify the complexities of memory management. In this paper, we attempt to have a better understanding of UM by evaluating the performance of UM programs on an AMD MI100 GPU. More specifically, we evaluate data migration using UM against other data transfer techniques for the overall performance of an application, assess the impacts of three commonly used optimization techniques on the kernel execution time of a vector add sample, and compare the performance and productivity of selected benchmarks with and without UM. The performance overhead associated with UM is not trivial, but it can improve programming productivity by reducing lines of code for scientific applications. We aim to present early results and feedback on the UM performance to the vendor.","PeriodicalId":286968,"journal":{"name":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Evaluating Unified Memory Performance in HIP\",\"authors\":\"Zheming Jin, J. Vetter\",\"doi\":\"10.1109/IPDPSW55747.2022.00096\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Heterogeneous unified memory management between a CPU and a GPU is a major challenge in GPU computing. Recently, unified memory (UM) has been supported by software and hardware components on AMD computing platforms. The support could simplify the complexities of memory management. In this paper, we attempt to have a better understanding of UM by evaluating the performance of UM programs on an AMD MI100 GPU. More specifically, we evaluate data migration using UM against other data transfer techniques for the overall performance of an application, assess the impacts of three commonly used optimization techniques on the kernel execution time of a vector add sample, and compare the performance and productivity of selected benchmarks with and without UM. The performance overhead associated with UM is not trivial, but it can improve programming productivity by reducing lines of code for scientific applications. We aim to present early results and feedback on the UM performance to the vendor.\",\"PeriodicalId\":286968,\"journal\":{\"name\":\"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IPDPSW55747.2022.00096\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPSW55747.2022.00096","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

CPU和GPU之间的异构统一内存管理是GPU计算的主要挑战。最近，统一存储器(UM)在AMD计算平台上已经得到了软件和硬件组件的支持。这种支持可以简化内存管理的复杂性。在本文中，我们试图通过评估AMD MI100 GPU上UM程序的性能来更好地理解UM。更具体地说，我们评估了使用UM和其他数据传输技术的数据迁移对应用程序整体性能的影响，评估了三种常用优化技术对矢量添加示例的内核执行时间的影响，并比较了使用UM和不使用UM的选定基准的性能和生产力。与UM相关的性能开销不是微不足道的，但是它可以通过减少科学应用程序的代码行来提高编程效率。我们的目标是向供应商提供关于UM性能的早期结果和反馈。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluating Unified Memory Performance in HIP

Heterogeneous unified memory management between a CPU and a GPU is a major challenge in GPU computing. Recently, unified memory (UM) has been supported by software and hardware components on AMD computing platforms. The support could simplify the complexities of memory management. In this paper, we attempt to have a better understanding of UM by evaluating the performance of UM programs on an AMD MI100 GPU. More specifically, we evaluate data migration using UM against other data transfer techniques for the overall performance of an application, assess the impacts of three commonly used optimization techniques on the kernel execution time of a vector add sample, and compare the performance and productivity of selected benchmarks with and without UM. The performance overhead associated with UM is not trivial, but it can improve programming productivity by reducing lines of code for scientific applications. We aim to present early results and feedback on the UM performance to the vendor.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

自引率

0.00%

发文量