GPU通信中的性能权衡:主机和设备启动方法的研究

Taylor L. Groves, Benjamin Brock, Yuxin Chen, K. Ibrahim, Lenny Oliker, N. Wright, Samuel Williams, K. Yelick
{"title":"GPU通信中的性能权衡:主机和设备启动方法的研究","authors":"Taylor L. Groves, Benjamin Brock, Yuxin Chen, K. Ibrahim, Lenny Oliker, N. Wright, Samuel Williams, K. Yelick","doi":"10.1109/PMBS51919.2020.00016","DOIUrl":null,"url":null,"abstract":"Network communication on GPU-based systems is a significant roadblock for many applications with small but frequent messaging requirements. One common question for application developers is, \"How can they reduce the overheads and achieve the best communication performance on GPUs?\" This work examines device initiated versus host initiated inter-node GPU communication using NVSHMEM. We derive basic communication model parameters for single message and batched communication before validating our model against distributed GEMM benchmarks. We use our model to estimate performance benefits for applications transitioning from CPUs to GPUS for fixed-size and scaled workloads and provide general guidelines for reducing communication overheads. Our findings show that the host-initiated approach generally outperforms the device-initiated approach for the system evaluated.","PeriodicalId":383727,"journal":{"name":"2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches\",\"authors\":\"Taylor L. Groves, Benjamin Brock, Yuxin Chen, K. Ibrahim, Lenny Oliker, N. Wright, Samuel Williams, K. Yelick\",\"doi\":\"10.1109/PMBS51919.2020.00016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Network communication on GPU-based systems is a significant roadblock for many applications with small but frequent messaging requirements. One common question for application developers is, \\\"How can they reduce the overheads and achieve the best communication performance on GPUs?\\\" This work examines device initiated versus host initiated inter-node GPU communication using NVSHMEM. We derive basic communication model parameters for single message and batched communication before validating our model against distributed GEMM benchmarks. We use our model to estimate performance benefits for applications transitioning from CPUs to GPUS for fixed-size and scaled workloads and provide general guidelines for reducing communication overheads. Our findings show that the host-initiated approach generally outperforms the device-initiated approach for the system evaluated.\",\"PeriodicalId\":383727,\"journal\":{\"name\":\"2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PMBS51919.2020.00016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PMBS51919.2020.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

基于gpu的系统上的网络通信对于许多具有少量但频繁的消息传递需求的应用程序来说是一个重要的障碍。对于应用程序开发人员来说,一个常见的问题是,“他们如何减少开销并在gpu上实现最佳通信性能?”这项工作检查了使用NVSHMEM的设备发起与主机发起的节点间GPU通信。在根据分布式GEMM基准验证我们的模型之前,我们推导了用于单个消息和批处理通信的基本通信模型参数。我们使用我们的模型来估计应用程序从cpu过渡到gpu的性能优势,用于固定大小和可缩放的工作负载,并提供减少通信开销的一般指导方针。我们的研究结果表明,主机启动的方法通常优于设备启动的方法对系统进行评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches
Network communication on GPU-based systems is a significant roadblock for many applications with small but frequent messaging requirements. One common question for application developers is, "How can they reduce the overheads and achieve the best communication performance on GPUs?" This work examines device initiated versus host initiated inter-node GPU communication using NVSHMEM. We derive basic communication model parameters for single message and batched communication before validating our model against distributed GEMM benchmarks. We use our model to estimate performance benefits for applications transitioning from CPUs to GPUS for fixed-size and scaled workloads and provide general guidelines for reducing communication overheads. Our findings show that the host-initiated approach generally outperforms the device-initiated approach for the system evaluated.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信