基于uGNI的Cray互连轻量级网络基板实现高效节点间消息传递和远程内存访问

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID) Pub Date : 2018-05-01 DOI:10.1109/CCGRID.2018.00006

U. Wickramasinghe, A. Lumsdaine

{"title":"基于uGNI的Cray互连轻量级网络基板实现高效节点间消息传递和远程内存访问","authors":"U. Wickramasinghe, A. Lumsdaine","doi":"10.1109/CCGRID.2018.00006","DOIUrl":null,"url":null,"abstract":"Today's cutting-edge network hardware features extremely low latency and high bandwidth transactions for higher-level communication substrates. The Cray XC/XE family of network fabrics, also known as Cray Aries/Gemini respectively, supports such high-performance remote memory access operations (RMA) and a plethora of transaction modes to optimize communication via lower-level interfaces such as uGNI and DMAPP. However, enabling efficient one-sided communication for higher-level substrates is difficult due to barriers presented by the programming model itself, as well as miscellaneous synchronization bottlenecks at the runtime layers. We present an efficient programming model based on a distributed memory allocator for RMA and a communication substrate based on readers and writers for inter-node message passing and RMA operations. We try to maximize performance by introducing a scalable RMA event notification scheme and synchronization protocols that fully leverage Aries/Gemini fabric. Micro-benchmark results demonstrate that our library outperforms Cray MPI-3.0-based RMA one-sided operations by 1.5X and up to 6X in certain cases and is comparable or improves upon performance on others.","PeriodicalId":321027,"journal":{"name":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Enabling Efficient Inter-Node Message Passing and Remote Memory Access Via a uGNI Based Light-Weight Network Substrate for Cray Interconnects\",\"authors\":\"U. Wickramasinghe, A. Lumsdaine\",\"doi\":\"10.1109/CCGRID.2018.00006\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Today's cutting-edge network hardware features extremely low latency and high bandwidth transactions for higher-level communication substrates. The Cray XC/XE family of network fabrics, also known as Cray Aries/Gemini respectively, supports such high-performance remote memory access operations (RMA) and a plethora of transaction modes to optimize communication via lower-level interfaces such as uGNI and DMAPP. However, enabling efficient one-sided communication for higher-level substrates is difficult due to barriers presented by the programming model itself, as well as miscellaneous synchronization bottlenecks at the runtime layers. We present an efficient programming model based on a distributed memory allocator for RMA and a communication substrate based on readers and writers for inter-node message passing and RMA operations. We try to maximize performance by introducing a scalable RMA event notification scheme and synchronization protocols that fully leverage Aries/Gemini fabric. Micro-benchmark results demonstrate that our library outperforms Cray MPI-3.0-based RMA one-sided operations by 1.5X and up to 6X in certain cases and is comparable or improves upon performance on others.\",\"PeriodicalId\":321027,\"journal\":{\"name\":\"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGRID.2018.00006\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2018.00006","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

当今尖端的网络硬件具有极低的延迟和高带宽事务，适用于更高级别的通信基板。Cray XC/XE系列网络结构，也分别被称为Cray Aries/Gemini，支持高性能远程内存访问操作(RMA)和大量的事务模式，以优化通过底层接口(如uGNI和DMAPP)的通信。然而，由于编程模型本身存在的障碍以及运行时层的各种同步瓶颈，为更高级别的基板实现有效的单侧通信是困难的。我们提出了一种高效的编程模型，该模型基于RMA的分布式内存分配器和基于读写器的通信基板，用于节点间消息传递和RMA操作。我们试图通过引入可扩展的RMA事件通知方案和充分利用白羊座/双子座结构的同步协议来最大化性能。微基准测试结果表明，我们的库在某些情况下比基于Cray mpi -3.0的RMA单边操作性能高1.5倍，最高可达6倍，并且在其他情况下性能相当或有所提高。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Enabling Efficient Inter-Node Message Passing and Remote Memory Access Via a uGNI Based Light-Weight Network Substrate for Cray Interconnects

Today's cutting-edge network hardware features extremely low latency and high bandwidth transactions for higher-level communication substrates. The Cray XC/XE family of network fabrics, also known as Cray Aries/Gemini respectively, supports such high-performance remote memory access operations (RMA) and a plethora of transaction modes to optimize communication via lower-level interfaces such as uGNI and DMAPP. However, enabling efficient one-sided communication for higher-level substrates is difficult due to barriers presented by the programming model itself, as well as miscellaneous synchronization bottlenecks at the runtime layers. We present an efficient programming model based on a distributed memory allocator for RMA and a communication substrate based on readers and writers for inter-node message passing and RMA operations. We try to maximize performance by introducing a scalable RMA event notification scheme and synchronization protocols that fully leverage Aries/Gemini fabric. Micro-benchmark results demonstrate that our library outperforms Cray MPI-3.0-based RMA one-sided operations by 1.5X and up to 6X in certain cases and is comparable or improves upon performance on others.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2018 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)

自引率

0.00%

发文量