软件定义广域网在控制器故障情况下保持可预测的流量工程性能

Songshi Dou;Zehua Guo
{"title":"软件定义广域网在控制器故障情况下保持可预测的流量工程性能","authors":"Songshi Dou;Zehua Guo","doi":"10.1109/JSAC.2025.3528814","DOIUrl":null,"url":null,"abstract":"Many new cloud services and applications have emerged recently. They account for a large share of traffic in Wide Area Networks (WANs) and provide traffic with various Quality of Service (QoS) requirements. Software-Defined Wide Area Network (SD-WAN) offers a promising opportunity for improving the performance of these applications with flexible network management. Nevertheless, SD-WANs are managed by controllers, and unpredictable controller failures may degrade flexible network management. Switches previously controlled by the failed controllers become offline, and flows traversing these offline switches lose the path programmability to route flows on available forwarding paths. Thus, these offline flows cannot be routed/rerouted on available paths to accommodate potential traffic variations, leading to severe performance degradation. Traffic Engineering (TE) is a prevalent network application, which aims to enable differentiable QoS for these numerous cloud services and applications. However, TE performance cannot be guaranteed when controller failures happen due to the loss of flexible network management. Existing recovery solutions reassign offline switches to other active controllers to recover the degraded path programmability but may not promise good TE performance since higher path programmability does not necessarily guarantee satisfactory TE performance. In this paper, we propose A<sc>res</small> to provide predictable TE performance under controller failures. We formulate an optimization problem, which aims to maintain predictable TE performance by jointly considering fine-grained flow-controller reassignment and flow rerouting. Given that the proposed problem is proven to be NP-hard, we further propose a heuristic algorithm to efficiently solve this problem. Specifically, when controller failures occur, A<sc>res</small> updates real-time network information with traffic traces and failure status to calculate optimal flow-controller reassignment and flow rerouting policies. A<sc>res</small> then reassigns and reroutes offline flows to maintain predictable TE performance. Extensive simulation results under two real-world topologies with traffic traces demonstrate that our problem formulation exhibits comparable load balancing performance to optimal TE solution without controller failures, and the proposed A<sc>res</small> can significantly improve average load balancing performance by up to 35.79% with low computation time compared with the state-of-the-art solution.","PeriodicalId":73294,"journal":{"name":"IEEE journal on selected areas in communications : a publication of the IEEE Communications Society","volume":"43 2","pages":"524-536"},"PeriodicalIF":0.0000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Maintaining Predictable Traffic Engineering Performance Under Controller Failures for Software-Defined WANs\",\"authors\":\"Songshi Dou;Zehua Guo\",\"doi\":\"10.1109/JSAC.2025.3528814\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many new cloud services and applications have emerged recently. They account for a large share of traffic in Wide Area Networks (WANs) and provide traffic with various Quality of Service (QoS) requirements. Software-Defined Wide Area Network (SD-WAN) offers a promising opportunity for improving the performance of these applications with flexible network management. Nevertheless, SD-WANs are managed by controllers, and unpredictable controller failures may degrade flexible network management. Switches previously controlled by the failed controllers become offline, and flows traversing these offline switches lose the path programmability to route flows on available forwarding paths. Thus, these offline flows cannot be routed/rerouted on available paths to accommodate potential traffic variations, leading to severe performance degradation. Traffic Engineering (TE) is a prevalent network application, which aims to enable differentiable QoS for these numerous cloud services and applications. However, TE performance cannot be guaranteed when controller failures happen due to the loss of flexible network management. Existing recovery solutions reassign offline switches to other active controllers to recover the degraded path programmability but may not promise good TE performance since higher path programmability does not necessarily guarantee satisfactory TE performance. In this paper, we propose A<sc>res</small> to provide predictable TE performance under controller failures. We formulate an optimization problem, which aims to maintain predictable TE performance by jointly considering fine-grained flow-controller reassignment and flow rerouting. Given that the proposed problem is proven to be NP-hard, we further propose a heuristic algorithm to efficiently solve this problem. Specifically, when controller failures occur, A<sc>res</small> updates real-time network information with traffic traces and failure status to calculate optimal flow-controller reassignment and flow rerouting policies. A<sc>res</small> then reassigns and reroutes offline flows to maintain predictable TE performance. Extensive simulation results under two real-world topologies with traffic traces demonstrate that our problem formulation exhibits comparable load balancing performance to optimal TE solution without controller failures, and the proposed A<sc>res</small> can significantly improve average load balancing performance by up to 35.79% with low computation time compared with the state-of-the-art solution.\",\"PeriodicalId\":73294,\"journal\":{\"name\":\"IEEE journal on selected areas in communications : a publication of the IEEE Communications Society\",\"volume\":\"43 2\",\"pages\":\"524-536\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-01-13\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE journal on selected areas in communications : a publication of the IEEE Communications Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10839029/\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE journal on selected areas in communications : a publication of the IEEE Communications Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10839029/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

最近出现了许多新的云服务和应用程序。它们在广域网(wan)中占很大的流量份额,并提供各种服务质量(QoS)要求的流量。软件定义广域网(SD-WAN)通过灵活的网络管理为提高这些应用程序的性能提供了一个很有希望的机会。但是,sd - wan是由控制器管理的,控制器不可预测的故障可能会降低网络管理的灵活性。先前由故障控制器控制的交换机变为脱机,并且遍历这些脱机交换机的流失去了路径可编程性,从而无法在可用的转发路径上路由流。因此,这些脱机流不能在可用路径上路由/重路由,以适应潜在的流量变化,从而导致严重的性能下降。流量工程(TE)是一种流行的网络应用,其目的是为这些众多的云服务和应用程序提供可区分的QoS。但是,当控制器出现故障时,由于网络管理不灵活,无法保证TE的性能。现有的恢复解决方案将离线交换机重新分配给其他主动控制器,以恢复降级的路径可编程性,但可能无法保证良好的TE性能,因为更高的路径可编程性不一定保证令人满意的TE性能。在本文中,我们提出Ares在控制器失效时提供可预测的TE性能。我们制定了一个优化问题,旨在通过联合考虑细粒度流量控制器重分配和流量重路由来保持可预测的TE性能。鉴于所提出的问题被证明是np困难的,我们进一步提出了一种启发式算法来有效地解决该问题。具体来说,当控制器发生故障时,Ares会实时更新网络信息,包括流量轨迹和故障状态,以计算最优的流控制器重分配和流重路由策略。Ares然后重新分配和重新路由离线流,以保持可预测的TE性能。在两种具有流量轨迹的现实拓扑下进行的大量仿真结果表明,我们的问题公式具有与无控制器故障的最优TE解决方案相当的负载平衡性能,并且与最先进的解决方案相比,所提出的Ares可以显着提高平均负载平衡性能高达35.79%,且计算时间较短。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Maintaining Predictable Traffic Engineering Performance Under Controller Failures for Software-Defined WANs
Many new cloud services and applications have emerged recently. They account for a large share of traffic in Wide Area Networks (WANs) and provide traffic with various Quality of Service (QoS) requirements. Software-Defined Wide Area Network (SD-WAN) offers a promising opportunity for improving the performance of these applications with flexible network management. Nevertheless, SD-WANs are managed by controllers, and unpredictable controller failures may degrade flexible network management. Switches previously controlled by the failed controllers become offline, and flows traversing these offline switches lose the path programmability to route flows on available forwarding paths. Thus, these offline flows cannot be routed/rerouted on available paths to accommodate potential traffic variations, leading to severe performance degradation. Traffic Engineering (TE) is a prevalent network application, which aims to enable differentiable QoS for these numerous cloud services and applications. However, TE performance cannot be guaranteed when controller failures happen due to the loss of flexible network management. Existing recovery solutions reassign offline switches to other active controllers to recover the degraded path programmability but may not promise good TE performance since higher path programmability does not necessarily guarantee satisfactory TE performance. In this paper, we propose Ares to provide predictable TE performance under controller failures. We formulate an optimization problem, which aims to maintain predictable TE performance by jointly considering fine-grained flow-controller reassignment and flow rerouting. Given that the proposed problem is proven to be NP-hard, we further propose a heuristic algorithm to efficiently solve this problem. Specifically, when controller failures occur, Ares updates real-time network information with traffic traces and failure status to calculate optimal flow-controller reassignment and flow rerouting policies. Ares then reassigns and reroutes offline flows to maintain predictable TE performance. Extensive simulation results under two real-world topologies with traffic traces demonstrate that our problem formulation exhibits comparable load balancing performance to optimal TE solution without controller failures, and the proposed Ares can significantly improve average load balancing performance by up to 35.79% with low computation time compared with the state-of-the-art solution.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信