以 GPU 为中心的通信格局

arXiv - CS - Performance Pub Date : 2024-09-15 DOI:arxiv-2409.09874

Didem Unat, Ilyas Turimbetov, Mohammed Kefah Taha Issa, Doğan Sağbili, Flavio Vella, Daniele De Sensi, Ismayil Ismayilov

{"title":"以 GPU 为中心的通信格局","authors":"Didem Unat, Ilyas Turimbetov, Mohammed Kefah Taha Issa, Doğan Sağbili, Flavio Vella, Daniele De Sensi, Ismayil Ismayilov","doi":"arxiv-2409.09874","DOIUrl":null,"url":null,"abstract":"n recent years, GPUs have become the preferred accelerators for HPC and ML\napplications due to their parallelism and fast memory bandwidth. While GPUs\nboost computation, inter-GPU communication can create scalability bottlenecks,\nespecially as the number of GPUs per node and cluster grows. Traditionally, the\nCPU managed multi-GPU communication, but advancements in GPU-centric\ncommunication now challenge this CPU dominance by reducing its involvement,\ngranting GPUs more autonomy in communication tasks, and addressing mismatches\nin multi-GPU communication and computation. This paper provides a landscape of GPU-centric communication, focusing on\nvendor mechanisms and user-level library supports. It aims to clarify the\ncomplexities and diverse options in this field, define the terminology, and\ncategorize existing approaches within and across nodes. The paper discusses\nvendor-provided mechanisms for communication and memory management in multi-GPU\nexecution and reviews major communication libraries, their benefits,\nchallenges, and performance insights. Then, it explores key research paradigms,\nfuture outlooks, and open research questions. By extensively describing\nGPU-centric communication techniques across the software and hardware stacks,\nwe provide researchers, programmers, engineers, and library designers insights\non how to exploit multi-GPU systems at their best.","PeriodicalId":501291,"journal":{"name":"arXiv - CS - Performance","volume":"31 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"The Landscape of GPU-Centric Communication\",\"authors\":\"Didem Unat, Ilyas Turimbetov, Mohammed Kefah Taha Issa, Doğan Sağbili, Flavio Vella, Daniele De Sensi, Ismayil Ismayilov\",\"doi\":\"arxiv-2409.09874\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"n recent years, GPUs have become the preferred accelerators for HPC and ML\\napplications due to their parallelism and fast memory bandwidth. While GPUs\\nboost computation, inter-GPU communication can create scalability bottlenecks,\\nespecially as the number of GPUs per node and cluster grows. Traditionally, the\\nCPU managed multi-GPU communication, but advancements in GPU-centric\\ncommunication now challenge this CPU dominance by reducing its involvement,\\ngranting GPUs more autonomy in communication tasks, and addressing mismatches\\nin multi-GPU communication and computation. This paper provides a landscape of GPU-centric communication, focusing on\\nvendor mechanisms and user-level library supports. It aims to clarify the\\ncomplexities and diverse options in this field, define the terminology, and\\ncategorize existing approaches within and across nodes. The paper discusses\\nvendor-provided mechanisms for communication and memory management in multi-GPU\\nexecution and reviews major communication libraries, their benefits,\\nchallenges, and performance insights. Then, it explores key research paradigms,\\nfuture outlooks, and open research questions. By extensively describing\\nGPU-centric communication techniques across the software and hardware stacks,\\nwe provide researchers, programmers, engineers, and library designers insights\\non how to exploit multi-GPU systems at their best.\",\"PeriodicalId\":501291,\"journal\":{\"name\":\"arXiv - CS - Performance\",\"volume\":\"31 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - CS - Performance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09874\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - CS - Performance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

近年来，GPU 因其并行性和快速内存带宽而成为 HPC 和 ML 应用的首选加速器。虽然 GPU 可以提高计算能力，但 GPU 之间的通信会造成可扩展性瓶颈，尤其是当每个节点和集群的 GPU 数量增加时。传统上，CPU 负责管理多 GPU 通信，但现在以 GPU 为中心的通信技术的进步挑战了 CPU 的主导地位，减少了 CPU 的参与，赋予 GPU 在通信任务中更多的自主权，并解决了多 GPU 通信和计算中的不匹配问题。本文介绍了以 GPU 为中心的通信，重点是供应商机制和用户级库支持。本文旨在阐明该领域的复杂性和多种选择，定义术语，并对节点内和节点间的现有方法进行分类。本文讨论了供应商提供的多 GPU 执行中的通信和内存管理机制，回顾了主要的通信库、其优势、挑战和性能见解。然后，论文探讨了关键研究范例、未来展望和开放研究问题。通过广泛介绍软件和硬件堆栈中以 GPU 为中心的通信技术，我们为研究人员、程序员、工程师和库设计人员提供了如何以最佳方式利用多 GPU 系统的见解。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

The Landscape of GPU-Centric Communication

n recent years, GPUs have become the preferred accelerators for HPC and ML applications due to their parallelism and fast memory bandwidth. While GPUs boost computation, inter-GPU communication can create scalability bottlenecks, especially as the number of GPUs per node and cluster grows. Traditionally, the CPU managed multi-GPU communication, but advancements in GPU-centric communication now challenge this CPU dominance by reducing its involvement, granting GPUs more autonomy in communication tasks, and addressing mismatches in multi-GPU communication and computation. This paper provides a landscape of GPU-centric communication, focusing on vendor mechanisms and user-level library supports. It aims to clarify the complexities and diverse options in this field, define the terminology, and categorize existing approaches within and across nodes. The paper discusses vendor-provided mechanisms for communication and memory management in multi-GPU execution and reviews major communication libraries, their benefits, challenges, and performance insights. Then, it explores key research paradigms, future outlooks, and open research questions. By extensively describing GPU-centric communication techniques across the software and hardware stacks, we provide researchers, programmers, engineers, and library designers insights on how to exploit multi-GPU systems at their best.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

arXiv - CS - Performance

自引率

0.00%

发文量