Taylor L. Groves, Benjamin Brock, Yuxin Chen, K. Ibrahim, Lenny Oliker, N. Wright, Samuel Williams, K. Yelick
{"title":"GPU通信中的性能权衡:主机和设备启动方法的研究","authors":"Taylor L. Groves, Benjamin Brock, Yuxin Chen, K. Ibrahim, Lenny Oliker, N. Wright, Samuel Williams, K. Yelick","doi":"10.1109/PMBS51919.2020.00016","DOIUrl":null,"url":null,"abstract":"Network communication on GPU-based systems is a significant roadblock for many applications with small but frequent messaging requirements. One common question for application developers is, \"How can they reduce the overheads and achieve the best communication performance on GPUs?\" This work examines device initiated versus host initiated inter-node GPU communication using NVSHMEM. We derive basic communication model parameters for single message and batched communication before validating our model against distributed GEMM benchmarks. We use our model to estimate performance benefits for applications transitioning from CPUs to GPUS for fixed-size and scaled workloads and provide general guidelines for reducing communication overheads. Our findings show that the host-initiated approach generally outperforms the device-initiated approach for the system evaluated.","PeriodicalId":383727,"journal":{"name":"2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches\",\"authors\":\"Taylor L. Groves, Benjamin Brock, Yuxin Chen, K. Ibrahim, Lenny Oliker, N. Wright, Samuel Williams, K. Yelick\",\"doi\":\"10.1109/PMBS51919.2020.00016\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Network communication on GPU-based systems is a significant roadblock for many applications with small but frequent messaging requirements. One common question for application developers is, \\\"How can they reduce the overheads and achieve the best communication performance on GPUs?\\\" This work examines device initiated versus host initiated inter-node GPU communication using NVSHMEM. We derive basic communication model parameters for single message and batched communication before validating our model against distributed GEMM benchmarks. We use our model to estimate performance benefits for applications transitioning from CPUs to GPUS for fixed-size and scaled workloads and provide general guidelines for reducing communication overheads. Our findings show that the host-initiated approach generally outperforms the device-initiated approach for the system evaluated.\",\"PeriodicalId\":383727,\"journal\":{\"name\":\"2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/PMBS51919.2020.00016\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/PMBS51919.2020.00016","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Performance Trade-offs in GPU Communication: A Study of Host and Device-initiated Approaches
Network communication on GPU-based systems is a significant roadblock for many applications with small but frequent messaging requirements. One common question for application developers is, "How can they reduce the overheads and achieve the best communication performance on GPUs?" This work examines device initiated versus host initiated inter-node GPU communication using NVSHMEM. We derive basic communication model parameters for single message and batched communication before validating our model against distributed GEMM benchmarks. We use our model to estimate performance benefits for applications transitioning from CPUs to GPUS for fixed-size and scaled workloads and provide general guidelines for reducing communication overheads. Our findings show that the host-initiated approach generally outperforms the device-initiated approach for the system evaluated.