Sliding Substitution of Failed Nodes

A. Hori, Kazumi Yoshinaga, T. Hérault, Aurélien Bouteiller, G. Bosilca, Y. Ishikawa
{"title":"Sliding Substitution of Failed Nodes","authors":"A. Hori, Kazumi Yoshinaga, T. Hérault, Aurélien Bouteiller, G. Bosilca, Y. Ishikawa","doi":"10.1145/2802658.2802670","DOIUrl":null,"url":null,"abstract":"This paper considers the questions of how spare nodes should be allocated, how to substitute them for faulty nodes, and how much the communication performance is affected by such a substitution. The third question stems from the modification of the rank mapping by node substitutions, which can incur additional message collisions. In a stencil computation, rank mapping is done in a straightforward way on a Cartesian network without incurring any message collisions. However, once a substitution has occurred, the node- rank mapping may be destroyed. Therefore, these questions must be answered in a way that minimizes the degradation of communication performance. In this paper, several spare-node allocation and nodesubstitution methods will be proposed, analyzed, and compared in terms of communication performance following the substitution. It will be shown that when a failure occurs, the peer-to-peer (P2P) communication performance on the K computer can be slowed by a factor of three and collective performance can be cut in half. On BG/Q, P2P performance can be slowed by a factor of five and collective performance can be slowed by a factor of ten. However, those numbers can be reduced by using an appropriate substitution method.","PeriodicalId":365272,"journal":{"name":"Proceedings of the 22nd European MPI Users' Group Meeting","volume":"129 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 22nd European MPI Users' Group Meeting","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2802658.2802670","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

This paper considers the questions of how spare nodes should be allocated, how to substitute them for faulty nodes, and how much the communication performance is affected by such a substitution. The third question stems from the modification of the rank mapping by node substitutions, which can incur additional message collisions. In a stencil computation, rank mapping is done in a straightforward way on a Cartesian network without incurring any message collisions. However, once a substitution has occurred, the node- rank mapping may be destroyed. Therefore, these questions must be answered in a way that minimizes the degradation of communication performance. In this paper, several spare-node allocation and nodesubstitution methods will be proposed, analyzed, and compared in terms of communication performance following the substitution. It will be shown that when a failure occurs, the peer-to-peer (P2P) communication performance on the K computer can be slowed by a factor of three and collective performance can be cut in half. On BG/Q, P2P performance can be slowed by a factor of five and collective performance can be slowed by a factor of ten. However, those numbers can be reduced by using an appropriate substitution method.
失效节点滑动替换
本文考虑了如何分配备用节点,如何替换故障节点,以及这种替换对通信性能的影响。第三个问题源于节点替换对秩映射的修改,这可能导致额外的消息冲突。在模板计算中,秩映射以一种直接的方式在笛卡尔网络上完成,而不会产生任何消息冲突。然而,一旦替换发生,节点-秩映射可能被破坏。因此,必须以最小化通信性能下降的方式回答这些问题。本文将提出几种备用节点分配和节点替代方法,并对替代后的通信性能进行分析和比较。当发生故障时,K计算机上的点对点(P2P)通信性能可以减慢三倍,集体性能可以减少一半。在BG/Q上,P2P性能可能会降低5倍,而集体性能可能会降低10倍。然而,这些数字可以通过使用适当的替代方法来减少。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信