低伸缩的本地快速故障转移路由

Comput. Commun. Rev. Pub Date : 2018-04-27 DOI:10.1145/3211852.3211858

Klaus-Tycho Foerster, Y. Pignolet, S. Schmid, Gilles Trédan

{"title":"低伸缩的本地快速故障转移路由","authors":"Klaus-Tycho Foerster, Y. Pignolet, S. Schmid, Gilles Trédan","doi":"10.1145/3211852.3211858","DOIUrl":null,"url":null,"abstract":"Network failures are frequent and disruptive, and can significantly reduce the throughput even in highly connected and regular networks such as datacenters. While many modern networks support some kind of local fast failover to quickly reroute flows encountering link failures to new paths, employing such mechanisms is known to be non-trivial, as conditional failover rules can only depend on local failure information.\n While over the last years, important insights have been gained on how to design failover schemes providing high resiliency, existing approaches have the shortcoming that the resulting failover routes may be unnecessarily long, i.e., they have a large stretch compared to the original route length. This is a serious drawback, as long routes entail higher latencies and introduce loads, which may cause the rerouted flows to interfere with existing flows and harm throughput.\n This paper presents the first deterministic local fast failover algorithms providing provable resiliency and failover route lengths, even in the presence of many concurrent failures. We present stretch-optimal failover algorithms for different network topologies, including multi-dimensional grids, hypercubes and Clos networks, as they are frequently deployed in the context of HPC clusters and datacenters. We show that the computed failover routes are optimal in the sense that no failover algorithm can provide shorter paths for a given number of link failures.","PeriodicalId":403234,"journal":{"name":"Comput. Commun. Rev.","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-04-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Local Fast Failover Routing With Low Stretch\",\"authors\":\"Klaus-Tycho Foerster, Y. Pignolet, S. Schmid, Gilles Trédan\",\"doi\":\"10.1145/3211852.3211858\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Network failures are frequent and disruptive, and can significantly reduce the throughput even in highly connected and regular networks such as datacenters. While many modern networks support some kind of local fast failover to quickly reroute flows encountering link failures to new paths, employing such mechanisms is known to be non-trivial, as conditional failover rules can only depend on local failure information.\\n While over the last years, important insights have been gained on how to design failover schemes providing high resiliency, existing approaches have the shortcoming that the resulting failover routes may be unnecessarily long, i.e., they have a large stretch compared to the original route length. This is a serious drawback, as long routes entail higher latencies and introduce loads, which may cause the rerouted flows to interfere with existing flows and harm throughput.\\n This paper presents the first deterministic local fast failover algorithms providing provable resiliency and failover route lengths, even in the presence of many concurrent failures. We present stretch-optimal failover algorithms for different network topologies, including multi-dimensional grids, hypercubes and Clos networks, as they are frequently deployed in the context of HPC clusters and datacenters. We show that the computed failover routes are optimal in the sense that no failover algorithm can provide shorter paths for a given number of link failures.\",\"PeriodicalId\":403234,\"journal\":{\"name\":\"Comput. Commun. Rev.\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-04-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Comput. Commun. Rev.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3211852.3211858\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Comput. Commun. Rev.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3211852.3211858","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 24

摘要

网络故障频繁且具有破坏性，即使在数据中心等高度连接的常规网络中也会显著降低吞吐量。虽然许多现代网络支持某种本地快速故障转移，以便在遇到链路故障时将流快速重路由到新的路径，但采用这种机制是非常重要的，因为条件故障转移规则只能依赖于本地故障信息。虽然在过去的几年里，在如何设计提供高弹性的故障转移方案方面已经获得了重要的见解，但现有的方法有一个缺点，即所得到的故障转移路由可能不必要地长，即与原始路由长度相比，它们有很大的延伸。这是一个严重的缺点，因为长路由需要更高的延迟并引入负载，这可能导致重路由流干扰现有流并损害吞吐量。本文提出了第一个确定性本地快速故障转移算法，即使在存在许多并发故障的情况下，也提供了可证明的弹性和故障转移路由长度。我们提出了针对不同网络拓扑的拉伸最优故障转移算法，包括多维网格、超立方体和Clos网络，因为它们经常部署在HPC集群和数据中心的上下文中。我们证明计算的故障转移路径是最优的，因为没有故障转移算法可以为给定数量的链路故障提供更短的路径。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Local Fast Failover Routing With Low Stretch

Network failures are frequent and disruptive, and can significantly reduce the throughput even in highly connected and regular networks such as datacenters. While many modern networks support some kind of local fast failover to quickly reroute flows encountering link failures to new paths, employing such mechanisms is known to be non-trivial, as conditional failover rules can only depend on local failure information. While over the last years, important insights have been gained on how to design failover schemes providing high resiliency, existing approaches have the shortcoming that the resulting failover routes may be unnecessarily long, i.e., they have a large stretch compared to the original route length. This is a serious drawback, as long routes entail higher latencies and introduce loads, which may cause the rerouted flows to interfere with existing flows and harm throughput. This paper presents the first deterministic local fast failover algorithms providing provable resiliency and failover route lengths, even in the presence of many concurrent failures. We present stretch-optimal failover algorithms for different network topologies, including multi-dimensional grids, hypercubes and Clos networks, as they are frequently deployed in the context of HPC clusters and datacenters. We show that the computed failover routes are optimal in the sense that no failover algorithm can provide shorter paths for a given number of link failures.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Comput. Commun. Rev.

自引率

0.00%

发文量