面向高性能计算的NVSHMEM评估

Int. J. Netw. Comput. Pub Date : 2021-01-08 DOI:10.15803/IJNC.11.1_78

C. Hsu, N. Imam

{"title":"面向高性能计算的NVSHMEM评估","authors":"C. Hsu, N. Imam","doi":"10.15803/IJNC.11.1_78","DOIUrl":null,"url":null,"abstract":"High Performance Computing has been a driving force behind important tasks such as scientific discovery and deep learning. It tends to achieve performance through greater concurrency and heterogeneity, where the underlying complexity of richer topologies is managed through software abstraction. In this paper, we present our assessment of NVSHMEM, an experimental programming library that supports the Partitioned Global Address Space programming model for NVIDIA GPU clusters. NVSHMEM offers several concrete advantages. One is that it reduces overheads and software complexity by allowing communication and computation to be interleaved vs. separating them into different phases. Another is that it implements the OpenSHMEM specification to provide efficient fine-grained one-sided communication, streamlining away overheads due to tag matching, wildcards, and unexpected messages which have compounding effect with increasing concurrency. It also offers ease of use by abstracting away low-level configuration operations that are required to enable low-overhead communication and direct loads and stores across processes. We evaluated NVSHMEM in terms of usability, functionality, and scalability by running two math kernels, matrix multiplication and Jacobi solver, and one full application, Horovod, on the 27,648-GPU Summit supercomputer. Our exercise of NVSHMEM at scale contributed to making NVSHMEM more robust and preparing it for production release.","PeriodicalId":270166,"journal":{"name":"Int. J. Netw. Comput.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-01-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessment of NVSHMEM for High Performance Computing\",\"authors\":\"C. Hsu, N. Imam\",\"doi\":\"10.15803/IJNC.11.1_78\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High Performance Computing has been a driving force behind important tasks such as scientific discovery and deep learning. It tends to achieve performance through greater concurrency and heterogeneity, where the underlying complexity of richer topologies is managed through software abstraction. In this paper, we present our assessment of NVSHMEM, an experimental programming library that supports the Partitioned Global Address Space programming model for NVIDIA GPU clusters. NVSHMEM offers several concrete advantages. One is that it reduces overheads and software complexity by allowing communication and computation to be interleaved vs. separating them into different phases. Another is that it implements the OpenSHMEM specification to provide efficient fine-grained one-sided communication, streamlining away overheads due to tag matching, wildcards, and unexpected messages which have compounding effect with increasing concurrency. It also offers ease of use by abstracting away low-level configuration operations that are required to enable low-overhead communication and direct loads and stores across processes. We evaluated NVSHMEM in terms of usability, functionality, and scalability by running two math kernels, matrix multiplication and Jacobi solver, and one full application, Horovod, on the 27,648-GPU Summit supercomputer. Our exercise of NVSHMEM at scale contributed to making NVSHMEM more robust and preparing it for production release.\",\"PeriodicalId\":270166,\"journal\":{\"name\":\"Int. J. Netw. Comput.\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-01-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Netw. Comput.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.15803/IJNC.11.1_78\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Netw. Comput.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.15803/IJNC.11.1_78","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

高性能计算一直是科学发现和深度学习等重要任务背后的推动力。它倾向于通过更高的并发性和异构性来实现性能，其中更丰富的拓扑的底层复杂性是通过软件抽象来管理的。在本文中，我们介绍了我们对NVSHMEM的评估，NVSHMEM是一个支持NVIDIA GPU集群分区全局地址空间编程模型的实验性编程库。NVSHMEM提供了几个具体的优势。其一，它通过允许通信和计算相互交错而不是将它们分离到不同的阶段来减少开销和软件复杂性。另一个原因是它实现了OpenSHMEM规范，以提供高效的细粒度单侧通信，简化了由于标记匹配、通配符和意外消息而产生的开销，这些开销会随着并发性的增加而产生复合效应。它还通过抽象掉底层配置操作来提供易用性，这些操作是实现低开销通信和跨进程直接加载和存储所必需的。我们通过在27,648 gpu的Summit超级计算机上运行两个数学内核(矩阵乘法和Jacobi求解器)以及一个完整的应用程序Horovod，从可用性、功能和可扩展性方面评估了NVSHMEM。我们对NVSHMEM的大规模实践有助于使NVSHMEM更加健壮，并为生产版本做好准备。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Assessment of NVSHMEM for High Performance Computing

High Performance Computing has been a driving force behind important tasks such as scientific discovery and deep learning. It tends to achieve performance through greater concurrency and heterogeneity, where the underlying complexity of richer topologies is managed through software abstraction. In this paper, we present our assessment of NVSHMEM, an experimental programming library that supports the Partitioned Global Address Space programming model for NVIDIA GPU clusters. NVSHMEM offers several concrete advantages. One is that it reduces overheads and software complexity by allowing communication and computation to be interleaved vs. separating them into different phases. Another is that it implements the OpenSHMEM specification to provide efficient fine-grained one-sided communication, streamlining away overheads due to tag matching, wildcards, and unexpected messages which have compounding effect with increasing concurrency. It also offers ease of use by abstracting away low-level configuration operations that are required to enable low-overhead communication and direct loads and stores across processes. We evaluated NVSHMEM in terms of usability, functionality, and scalability by running two math kernels, matrix multiplication and Jacobi solver, and one full application, Horovod, on the 27,648-GPU Summit supercomputer. Our exercise of NVSHMEM at scale contributed to making NVSHMEM more robust and preparing it for production release.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Int. J. Netw. Comput.

自引率

0.00%

发文量