Performance Evaluation of Adaptive Routing on Dragonfly-based Production Systems

Sudheer Chunduri, K. Harms, Taylor L. Groves, P. Mendygral, Justs Zarins, M. Weiland, Yasaman Ghadar
{"title":"Performance Evaluation of Adaptive Routing on Dragonfly-based Production Systems","authors":"Sudheer Chunduri, K. Harms, Taylor L. Groves, P. Mendygral, Justs Zarins, M. Weiland, Yasaman Ghadar","doi":"10.1109/IPDPS49936.2021.00042","DOIUrl":null,"url":null,"abstract":"Performance of applications in production environments can be sensitive to network congestion. Cray Aries supports adaptively routing each network packet independently based on the load or congestion encountered as a packet traverses the network. Software can dictate different routing policies, adjusting between minimal and non-minimal bias, for each posted message. We have extensively evaluated the sensitivity of the routing bias selection on application performance as well as whole system performance in both production and controlled conditions. We show that the default routing bias used in Aries-based systems is often sub-optimal and that using a higher bias towards minimal routes will not only reduce the congestion effects on the application but also will decrease the overall congestion on the network. This routing scheme results in not only improved mean performance (by up to 12%) of most production applications but also reduced run-to-run variability. Our study prompted the two supercomputing facilities (ALCF and NERSC) to change the default routing mode on their Aries-based systems. We present the substantial improvement measured in the overall congestion management and interconnect performance in production after making this change.","PeriodicalId":372234,"journal":{"name":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IPDPS49936.2021.00042","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Performance of applications in production environments can be sensitive to network congestion. Cray Aries supports adaptively routing each network packet independently based on the load or congestion encountered as a packet traverses the network. Software can dictate different routing policies, adjusting between minimal and non-minimal bias, for each posted message. We have extensively evaluated the sensitivity of the routing bias selection on application performance as well as whole system performance in both production and controlled conditions. We show that the default routing bias used in Aries-based systems is often sub-optimal and that using a higher bias towards minimal routes will not only reduce the congestion effects on the application but also will decrease the overall congestion on the network. This routing scheme results in not only improved mean performance (by up to 12%) of most production applications but also reduced run-to-run variability. Our study prompted the two supercomputing facilities (ALCF and NERSC) to change the default routing mode on their Aries-based systems. We present the substantial improvement measured in the overall congestion management and interconnect performance in production after making this change.
基于蜻蜓的生产系统自适应路由性能评价
生产环境中应用程序的性能可能对网络拥塞很敏感。Cray Aries支持根据数据包在网络中遇到的负载或拥塞情况,对每个网络数据包进行自适应路由。软件可以规定不同的路由策略,在最小和非最小偏差之间进行调整,为每个发布的消息。我们广泛地评估了在生产和控制条件下,布线偏置选择对应用性能以及整个系统性能的敏感性。我们表明,在基于aries的系统中使用的默认路由偏置通常是次优的,并且对最小路由使用更高的偏置不仅会减少对应用程序的拥塞影响,还会减少网络上的总体拥塞。这种路由方案不仅提高了大多数生产应用程序的平均性能(高达12%),还减少了运行间的可变性。我们的研究促使两个超级计算设施(ALCF和NERSC)改变其基于aries的系统上的默认路由模式。在进行此更改后,我们展示了在生产环境中的总体拥塞管理和互连性能方面所测量的实质性改进。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信