Efficient Congestion Management for High-Speed Interconnects using Adaptive Routing

José Rocher-González, J. Escudero-Sahuquillo, P. García, F. Quiles, Gaspar Mora
{"title":"Efficient Congestion Management for High-Speed Interconnects using Adaptive Routing","authors":"José Rocher-González, J. Escudero-Sahuquillo, P. García, F. Quiles, Gaspar Mora","doi":"10.1109/CCGRID.2019.00036","DOIUrl":null,"url":null,"abstract":"The interconnection network is the central element in high-performance computing (HPC) clusters and Datacenters, where thousands of end nodes must communicate in a fast and reliable manner. The network performance depends on several design choices, such as the topology, the routing algorithm, the switch architecture, etc. Highly efficient routing algorithms, either deterministic or adaptive, have been proposed to smartly balance traffic flows in cost-effective network topologies, but their performance is reduced in scenarios where congestion and their negative effects (e.g. the HoL blocking) appear. In particular, in scenarios where congestion is intense and persistent, the HoL blocking may degrade dramatically the performance of adaptive routing algorithms, since they may spread congested traffic flows through all the available routes. In addition, as we have shown in previous studies, this spreading of congested flows may spoil the performance of the static queuing schemes that are used to reduce HoL blocking by separating flows into different queues at switch buffers. Indeed, as these schemes are based on a static criterion defined prior to the traffic injection in the network, they are unable to avoid that congested and non-congested flows share queues when paired with adaptive routing. In this paper, we propose to use some existing static queuing schemes and dynamic allocation of virtual channels (VCs) to isolate into a single VC the flows whose routes have been adaptively routed, in order to prevent the impact of the congestion spreading through several routes. Basically, adapted flows are moved to a special adapted-flow channel (AFC), so that they do not interact with flows mapped to other VCs by the static queuing scheme. In this way, the HoL blocking that adaptively routed flows could cause to non-adaptive flows is prevented, even if congested flows have been spread through several routes. On the other hand, the static queuing scheme will reduce without any interference the HoL blocking that may appear among non-adaptive flows. To evaluate our proposal we have conducted extensive simulation experiments modeling large interconnection networks based on the fat-tree topology. From the obtained results, we can conclude that our approach efficiently and significantly reduces the HoL blocking impact in interconnection networks using adaptive routing and queuing schemes when congestion appears.","PeriodicalId":234571,"journal":{"name":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CCGRID.2019.00036","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

The interconnection network is the central element in high-performance computing (HPC) clusters and Datacenters, where thousands of end nodes must communicate in a fast and reliable manner. The network performance depends on several design choices, such as the topology, the routing algorithm, the switch architecture, etc. Highly efficient routing algorithms, either deterministic or adaptive, have been proposed to smartly balance traffic flows in cost-effective network topologies, but their performance is reduced in scenarios where congestion and their negative effects (e.g. the HoL blocking) appear. In particular, in scenarios where congestion is intense and persistent, the HoL blocking may degrade dramatically the performance of adaptive routing algorithms, since they may spread congested traffic flows through all the available routes. In addition, as we have shown in previous studies, this spreading of congested flows may spoil the performance of the static queuing schemes that are used to reduce HoL blocking by separating flows into different queues at switch buffers. Indeed, as these schemes are based on a static criterion defined prior to the traffic injection in the network, they are unable to avoid that congested and non-congested flows share queues when paired with adaptive routing. In this paper, we propose to use some existing static queuing schemes and dynamic allocation of virtual channels (VCs) to isolate into a single VC the flows whose routes have been adaptively routed, in order to prevent the impact of the congestion spreading through several routes. Basically, adapted flows are moved to a special adapted-flow channel (AFC), so that they do not interact with flows mapped to other VCs by the static queuing scheme. In this way, the HoL blocking that adaptively routed flows could cause to non-adaptive flows is prevented, even if congested flows have been spread through several routes. On the other hand, the static queuing scheme will reduce without any interference the HoL blocking that may appear among non-adaptive flows. To evaluate our proposal we have conducted extensive simulation experiments modeling large interconnection networks based on the fat-tree topology. From the obtained results, we can conclude that our approach efficiently and significantly reduces the HoL blocking impact in interconnection networks using adaptive routing and queuing schemes when congestion appears.
基于自适应路由的高速互连有效拥塞管理
互联网络是高性能计算(HPC)集群和数据中心的核心要素,需要成千上万的终端节点以快速可靠的方式进行通信。网络性能取决于几种设计选择,如拓扑结构、路由算法、交换机架构等。高效的路由算法,无论是确定性的还是自适应的,已经被提出在经济有效的网络拓扑中巧妙地平衡流量,但是在拥塞及其负面影响(例如HoL阻塞)出现的情况下,它们的性能会降低。特别是,在拥塞严重且持续的情况下,HoL阻塞可能会显著降低自适应路由算法的性能,因为它们可能会将拥塞的流量传播到所有可用的路由上。此外,正如我们在之前的研究中所显示的,这种拥塞流的扩散可能会破坏静态排队方案的性能,静态排队方案是通过在交换缓冲区将流分离到不同的队列来减少HoL阻塞的。事实上,由于这些方案是基于流量注入网络之前定义的静态标准,因此当与自适应路由配对时,它们无法避免拥塞和非拥塞流共享队列。在本文中,我们提出使用一些现有的静态排队方案和虚拟通道的动态分配(VCs)将路由已自适应路由的流隔离到单个VC中,以防止拥塞在多个路由上传播的影响。基本上,已适应流被移动到一个特殊的自适应流通道(AFC),这样它们就不会与通过静态排队方案映射到其他vc的流交互。通过这种方式,可以防止自适应路由流可能导致的非自适应流的HoL阻塞,即使拥塞的流已经通过几个路由传播。另一方面,静态排队方案可以在不干扰非自适应流之间可能出现的HoL阻塞的情况下减少阻塞。为了评估我们的建议,我们进行了广泛的仿真实验,基于胖树拓扑对大型互连网络进行建模。从获得的结果中,我们可以得出结论,当拥塞出现时,我们的方法使用自适应路由和排队方案有效且显著地减少了互连网络中HoL阻塞的影响。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信