Ravana: controller fault-tolerance in software-defined networking

Proceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research Pub Date : 2015-06-17 DOI:10.1145/2774993.2774996

N. Katta, Haoyu Zhang, M. Freedman, J. Rexford

{"title":"Ravana: controller fault-tolerance in software-defined networking","authors":"N. Katta, Haoyu Zhang, M. Freedman, J. Rexford","doi":"10.1145/2774993.2774996","DOIUrl":null,"url":null,"abstract":"Software-defined networking (SDN) offers greater flexibility than traditional distributed architectures, at the risk of the controller being a single point-of-failure. Unfortunately, existing fault-tolerance techniques, such as replicated state machine, are insufficient to ensure correct network behavior under controller failures. The challenge is that, in addition to the application state of the controllers, the switches maintain hard state that must be handled consistently. Thus, it is necessary to incorporate switch state into the system model to correctly offer a \"logically centralized\" controller. We introduce Ravana, a fault-tolerant SDN controller platform that processes the control messages transactionally and exactly once (at both the controllers and the switches). Ravana maintains these guarantees in the face of both controller and switch crashes. The key insight in Ravana is that replicated state machines can be extended with lightweight switch-side mechanisms to guarantee correctness, without involving the switches in an elaborate consensus protocol. Our prototype implementation of Ravana enables unmodified controller applications to execute in a fault-tolerant fashion. Experiments show that Ravana achieves high throughput with reasonable overhead, compared to a single controller, with a failover time under 100ms.","PeriodicalId":316190,"journal":{"name":"Proceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"153","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2774993.2774996","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 153

Abstract

Software-defined networking (SDN) offers greater flexibility than traditional distributed architectures, at the risk of the controller being a single point-of-failure. Unfortunately, existing fault-tolerance techniques, such as replicated state machine, are insufficient to ensure correct network behavior under controller failures. The challenge is that, in addition to the application state of the controllers, the switches maintain hard state that must be handled consistently. Thus, it is necessary to incorporate switch state into the system model to correctly offer a "logically centralized" controller. We introduce Ravana, a fault-tolerant SDN controller platform that processes the control messages transactionally and exactly once (at both the controllers and the switches). Ravana maintains these guarantees in the face of both controller and switch crashes. The key insight in Ravana is that replicated state machines can be extended with lightweight switch-side mechanisms to guarantee correctness, without involving the switches in an elaborate consensus protocol. Our prototype implementation of Ravana enables unmodified controller applications to execute in a fault-tolerant fashion. Experiments show that Ravana achieves high throughput with reasonable overhead, compared to a single controller, with a failover time under 100ms.

查看原文本刊更多论文

软件定义网络中的控制器容错

软件定义网络(SDN)提供了比传统分布式体系结构更大的灵活性，但存在控制器成为单点故障的风险。不幸的是，现有的容错技术，如复制状态机，不足以确保控制器故障时正确的网络行为。挑战在于，除了控制器的应用程序状态外，开关还保持必须一致处理的硬状态。因此，有必要将开关状态合并到系统模型中，以正确地提供“逻辑集中”的控制器。我们介绍了Ravana，这是一个容错SDN控制器平台，它以事务方式处理控制消息，并且只处理一次(在控制器和交换机上)。在控制器和交换机崩溃的情况下，Ravana保持这些保证。Ravana的关键观点是，复制的状态机可以用轻量级的开关端机制进行扩展，以保证正确性，而无需将开关涉及到复杂的共识协议中。我们的Ravana原型实现使未经修改的控制器应用程序能够以容错方式执行。实验表明，与单个控制器相比，Ravana在合理的开销下实现了高吞吐量，故障转移时间低于100ms。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 1st ACM SIGCOMM Symposium on Software Defined Networking Research

自引率

0.00%

发文量