蜻蜓+应用干扰建模与分析

Yao Kang, Xin Wang, Neil McGlohon, M. Mubarak, Sudheer Chunduri, Z. Lan
{"title":"蜻蜓+应用干扰建模与分析","authors":"Yao Kang, Xin Wang, Neil McGlohon, M. Mubarak, Sudheer Chunduri, Z. Lan","doi":"10.1145/3316480.3325517","DOIUrl":null,"url":null,"abstract":"Dragonfly class of networks are considered as promising interconnects for next-generation supercomputers. While Dragonfly+ networks offer more path diversity than the original Dragonfly design, they are still prone to performance variability due to their hierarchical architecture and resource sharing design. Event-driven network simulators are indispensable tools for navigating complex system design. In this study, we quantitatively evaluate a variety of application communication interactions on a 3,456-node Dragonfly+ system by using the CODES toolkit. This study looks at the impact of communication interference from a user's perspective. Specifically, for a given application submitted by a user, we examine how this application will behave with the existing workload running in the system under different job placement policies. Our simulation study considers hundreds of experiment configurations including four target applications with representative communication patterns under a variety of network traffic conditions. Our study shows that intra-job interference can cause severe performance degradation for communication-intensive applications. Inter-job interference can generally be reduced for applications with one-to-one or one-to-many communication patterns through job isolation. Application with one-to-all communication pattern is resilient to network interference.","PeriodicalId":398793,"journal":{"name":"Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Modeling and Analysis of Application Interference on Dragonfly+\",\"authors\":\"Yao Kang, Xin Wang, Neil McGlohon, M. Mubarak, Sudheer Chunduri, Z. Lan\",\"doi\":\"10.1145/3316480.3325517\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Dragonfly class of networks are considered as promising interconnects for next-generation supercomputers. While Dragonfly+ networks offer more path diversity than the original Dragonfly design, they are still prone to performance variability due to their hierarchical architecture and resource sharing design. Event-driven network simulators are indispensable tools for navigating complex system design. In this study, we quantitatively evaluate a variety of application communication interactions on a 3,456-node Dragonfly+ system by using the CODES toolkit. This study looks at the impact of communication interference from a user's perspective. Specifically, for a given application submitted by a user, we examine how this application will behave with the existing workload running in the system under different job placement policies. Our simulation study considers hundreds of experiment configurations including four target applications with representative communication patterns under a variety of network traffic conditions. Our study shows that intra-job interference can cause severe performance degradation for communication-intensive applications. Inter-job interference can generally be reduced for applications with one-to-one or one-to-many communication patterns through job isolation. Application with one-to-all communication pattern is resilient to network interference.\",\"PeriodicalId\":398793,\"journal\":{\"name\":\"Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation\",\"volume\":\"70 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3316480.3325517\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2019 ACM SIGSIM Conference on Principles of Advanced Discrete Simulation","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3316480.3325517","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

蜻蜓网络被认为是下一代超级计算机有前途的互连网络。虽然蜻蜓+网络提供了比原始蜻蜓设计更多的路径多样性,但由于其分层架构和资源共享设计,它们仍然容易出现性能变化。事件驱动网络模拟器是导航复杂系统设计不可缺少的工具。在这项研究中,我们使用CODES工具包对3456个节点的Dragonfly+系统上的各种应用程序通信交互进行了定量评估。这项研究从用户的角度来看通信干扰的影响。具体来说,对于用户提交的给定应用程序,我们将检查该应用程序在不同的工作安置策略下如何与系统中运行的现有工作负载一起运行。我们的仿真研究考虑了数百种实验配置,包括在各种网络流量条件下具有代表性通信模式的四种目标应用程序。我们的研究表明,在通信密集型应用中,工作内部干扰会导致严重的性能下降。对于具有一对一或一对多通信模式的应用程序,通过作业隔离通常可以减少作业间干扰。采用一对所有通信模式的应用具有较强的抗网络干扰能力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Modeling and Analysis of Application Interference on Dragonfly+
Dragonfly class of networks are considered as promising interconnects for next-generation supercomputers. While Dragonfly+ networks offer more path diversity than the original Dragonfly design, they are still prone to performance variability due to their hierarchical architecture and resource sharing design. Event-driven network simulators are indispensable tools for navigating complex system design. In this study, we quantitatively evaluate a variety of application communication interactions on a 3,456-node Dragonfly+ system by using the CODES toolkit. This study looks at the impact of communication interference from a user's perspective. Specifically, for a given application submitted by a user, we examine how this application will behave with the existing workload running in the system under different job placement policies. Our simulation study considers hundreds of experiment configurations including four target applications with representative communication patterns under a variety of network traffic conditions. Our study shows that intra-job interference can cause severe performance degradation for communication-intensive applications. Inter-job interference can generally be reduced for applications with one-to-one or one-to-many communication patterns through job isolation. Application with one-to-all communication pattern is resilient to network interference.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信