OpenSM动态重配置事件信令的可伸缩方法

Wei Lin Guay, Sven-Arne Reinemo
{"title":"OpenSM动态重配置事件信令的可伸缩方法","authors":"Wei Lin Guay, Sven-Arne Reinemo","doi":"10.1109/CCGrid.2011.48","DOIUrl":null,"url":null,"abstract":"Rerouting around faulty components, on-the-fly policy changes, and migration of jobs all require reconfiguration of data structures in the Queue Pairs residing in the hosts on an InfiniBand cluster. In addition to a proper implementation at the host, the subnet manager needs to implement a scalable method for signaling reconfiguration events to the hosts. In this paper we propose and evaluate three different implementations for signalling dynamic reconfiguration events with OpenSM. Through our evaluation we demonstrate a scalable solution for signalling host-side reconfiguration events in an InfiniBand network based on an example where dynamic network reconfiguration combined with a topology-agnostic routing function is used to avoid malfunctioning components. Through measurements on our test-cluster and an analytical study we show that our best proposal reduces reconfiguration latency by more than 90%and in certain situations eliminates it completely. Furthermore, the processing overhead in the subnet manager is shown to be minimal.","PeriodicalId":376385,"journal":{"name":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-05-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Scalable Method for Signalling Dynamic Reconfiguration Events with OpenSM\",\"authors\":\"Wei Lin Guay, Sven-Arne Reinemo\",\"doi\":\"10.1109/CCGrid.2011.48\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Rerouting around faulty components, on-the-fly policy changes, and migration of jobs all require reconfiguration of data structures in the Queue Pairs residing in the hosts on an InfiniBand cluster. In addition to a proper implementation at the host, the subnet manager needs to implement a scalable method for signaling reconfiguration events to the hosts. In this paper we propose and evaluate three different implementations for signalling dynamic reconfiguration events with OpenSM. Through our evaluation we demonstrate a scalable solution for signalling host-side reconfiguration events in an InfiniBand network based on an example where dynamic network reconfiguration combined with a topology-agnostic routing function is used to avoid malfunctioning components. Through measurements on our test-cluster and an analytical study we show that our best proposal reduces reconfiguration latency by more than 90%and in certain situations eliminates it completely. Furthermore, the processing overhead in the subnet manager is shown to be minimal.\",\"PeriodicalId\":376385,\"journal\":{\"name\":\"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-05-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CCGrid.2011.48\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGrid.2011.48","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

在InfiniBand集群中,重新路由故障组件、动态策略更改和作业迁移都需要重新配置驻留在主机上的Queue pair中的数据结构。除了在主机上正确实现之外,子网管理器还需要实现一种可伸缩的方法,用于向主机发送重新配置事件的信令。在本文中,我们提出并评估了使用OpenSM发送动态重构事件信号的三种不同实现。通过我们的评估,我们展示了一种可扩展的解决方案,用于在InfiniBand网络中发送主机侧重新配置事件的信号,该解决方案基于一个示例,该示例使用动态网络重新配置与拓扑无关路由功能相结合,以避免组件故障。通过对我们的测试集群的测量和分析研究,我们表明我们的最佳建议将重新配置延迟减少了90%以上,并且在某些情况下完全消除了它。此外,子网管理器中的处理开销显示为最小。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Scalable Method for Signalling Dynamic Reconfiguration Events with OpenSM
Rerouting around faulty components, on-the-fly policy changes, and migration of jobs all require reconfiguration of data structures in the Queue Pairs residing in the hosts on an InfiniBand cluster. In addition to a proper implementation at the host, the subnet manager needs to implement a scalable method for signaling reconfiguration events to the hosts. In this paper we propose and evaluate three different implementations for signalling dynamic reconfiguration events with OpenSM. Through our evaluation we demonstrate a scalable solution for signalling host-side reconfiguration events in an InfiniBand network based on an example where dynamic network reconfiguration combined with a topology-agnostic routing function is used to avoid malfunctioning components. Through measurements on our test-cluster and an analytical study we show that our best proposal reduces reconfiguration latency by more than 90%and in certain situations eliminates it completely. Furthermore, the processing overhead in the subnet manager is shown to be minimal.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信