Multi-Threading and Lock-Free MPI RMA Based Graph Processing on KNL and POWER Architectures

Mingzhe Li, Xiaoyi Lu, H. Subramoni, D. Panda
{"title":"Multi-Threading and Lock-Free MPI RMA Based Graph Processing on KNL and POWER Architectures","authors":"Mingzhe Li, Xiaoyi Lu, H. Subramoni, D. Panda","doi":"10.1145/3236367.3236371","DOIUrl":null,"url":null,"abstract":"Intel Knights Landing (KNL) and IBM POWER architectures are becoming widely deployed on modern supercomputing systems due to its powerful components. MPI Remote Memory Access (RMA) model that provides one-sided communication semantics has been seen as an attractive approach for developing High-Performance Data Analytics (HPDA) applications such as graph processing with irregular communication characteristics. To take advantage of a large number of hardware threads offered by KNL and POWER, HPDA applications and MPI RMA runtime need to be re-designed to get optimal performance. In this paper, we propose multi-threading and lock-free designs in the MPI runtime as well as Graph500 application on KNL and POWER architectures. At the micro-bench level, our proposed runtime-level designs are able to reduce the latency of uni-directional MPI_Put and MPI_Get by up to 3X compared to IntelMPI and Spectrum MPI. At the application level, with 1,024 processes on 32 KNL nodes, our proposed design could outperform IntelMPI library by 32%. With 512 processes on eight POWER nodes, our proposed design could outperform Spectrum MPI library by 19%. To the best of our knowledge, this is the first paper to design and evaluate MPI RMA-based graph processing applications on KNL and POWER architectures.","PeriodicalId":225539,"journal":{"name":"Proceedings of the 25th European MPI Users' Group Meeting","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th European MPI Users' Group Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3236367.3236371","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Intel Knights Landing (KNL) and IBM POWER architectures are becoming widely deployed on modern supercomputing systems due to its powerful components. MPI Remote Memory Access (RMA) model that provides one-sided communication semantics has been seen as an attractive approach for developing High-Performance Data Analytics (HPDA) applications such as graph processing with irregular communication characteristics. To take advantage of a large number of hardware threads offered by KNL and POWER, HPDA applications and MPI RMA runtime need to be re-designed to get optimal performance. In this paper, we propose multi-threading and lock-free designs in the MPI runtime as well as Graph500 application on KNL and POWER architectures. At the micro-bench level, our proposed runtime-level designs are able to reduce the latency of uni-directional MPI_Put and MPI_Get by up to 3X compared to IntelMPI and Spectrum MPI. At the application level, with 1,024 processes on 32 KNL nodes, our proposed design could outperform IntelMPI library by 32%. With 512 processes on eight POWER nodes, our proposed design could outperform Spectrum MPI library by 19%. To the best of our knowledge, this is the first paper to design and evaluate MPI RMA-based graph processing applications on KNL and POWER architectures.
基于KNL和POWER架构的多线程和无锁MPI RMA图处理
由于其强大的组件,Intel Knights Landing (KNL)和IBM POWER架构正广泛部署在现代超级计算系统上。MPI远程内存访问(RMA)模型提供单侧通信语义,已被视为开发高性能数据分析(HPDA)应用程序(如具有不规则通信特征的图形处理)的一种有吸引力的方法。为了利用KNL和POWER提供的大量硬件线程,HPDA应用程序和MPI RMA运行时需要重新设计以获得最佳性能。在本文中,我们提出了MPI运行时中的多线程和无锁设计,以及KNL和POWER架构上的Graph500应用程序。在微实验台上,与IntelMPI和Spectrum MPI相比,我们提出的运行时级设计能够将单向MPI_Put和MPI_Get的延迟减少3倍。在应用程序级别,在32个KNL节点上有1,024个进程,我们提出的设计可以比IntelMPI库高出32%。在8个POWER节点上有512个进程,我们提出的设计可以比Spectrum MPI库高出19%。据我们所知,这是第一篇在KNL和POWER架构上设计和评估基于MPI rma的图形处理应用程序的论文。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信