David Franco;Marivi Higuero;Ane Sanz;Juanjo Unzilla;Maider Huarte
{"title":"vFFR: A Very Fast Failure Recovery Strategy Implemented in Devices With Programmable Data Plane","authors":"David Franco;Marivi Higuero;Ane Sanz;Juanjo Unzilla;Maider Huarte","doi":"10.1109/OJCOMS.2024.3493417","DOIUrl":null,"url":null,"abstract":"The rapid emergence of new applications and services, and their increased demand for Quality of Service (QoS), have a significant impact on the development of today’s communication networks. As a result, communication networks are constantly evolving towards new architectures, such as the 6th Generation (6G) of communication systems, currently being studied in academic and research environments. One of the most critical aspects of designing communication networks is meeting the restricted delay and packet loss requirements. In this context, although link failure recovery has been widely addressed in the literature, it remains one of the main causes of packet losses and delays in the network. The failure recovery time in currently deployed technologies is still far from the sub-millisecond delay required in 6G networks. The time required for distributed network architectures to converge to a common network state after a link failure is excessive. In contrast, centralized architectures such as Software-Defined Networking (SDN) solve this problem but still need to notify the failure to a centralized controller, which increases the recovery time. This paper proposes a very Fast Failure Recovery (vFFR) strategy that can recover from link failures in sub-millisecond timescales by reacting directly from the data plane of the network devices while maintaining a synchronized state with the centralized controller. We first analyze current failure recovery strategies and classify them according to the techniques used to optimize failure recovery time. Afterward, we describe the design of a vFFR strategy that combines three data plane recovery algorithms to reduce latency and packet loss under varying network conditions. Our vFFR strategy has been modeled in P4 language and tested on an emulation platform to validate the three data plane recovery algorithms under different conditions. The results show that latency varies according to the alternate path selected in the recovery algorithm, and the packet loss rate remains constant even when the background traffic reaches 90% of the link capacity. In addition, the vFFR strategy has been implemented on Intel Tofino devices, achieving a failure recovery time lower than \n<inline-formula> <tex-math>$500~\\mu s$ </tex-math></inline-formula>\n and a total frame loss rate below 0.005% in all cases, including those with a 35 Gbps load.","PeriodicalId":33803,"journal":{"name":"IEEE Open Journal of the Communications Society","volume":"5 ","pages":"7121-7146"},"PeriodicalIF":6.3000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10746495","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Open Journal of the Communications Society","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10746495/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0
Abstract
The rapid emergence of new applications and services, and their increased demand for Quality of Service (QoS), have a significant impact on the development of today’s communication networks. As a result, communication networks are constantly evolving towards new architectures, such as the 6th Generation (6G) of communication systems, currently being studied in academic and research environments. One of the most critical aspects of designing communication networks is meeting the restricted delay and packet loss requirements. In this context, although link failure recovery has been widely addressed in the literature, it remains one of the main causes of packet losses and delays in the network. The failure recovery time in currently deployed technologies is still far from the sub-millisecond delay required in 6G networks. The time required for distributed network architectures to converge to a common network state after a link failure is excessive. In contrast, centralized architectures such as Software-Defined Networking (SDN) solve this problem but still need to notify the failure to a centralized controller, which increases the recovery time. This paper proposes a very Fast Failure Recovery (vFFR) strategy that can recover from link failures in sub-millisecond timescales by reacting directly from the data plane of the network devices while maintaining a synchronized state with the centralized controller. We first analyze current failure recovery strategies and classify them according to the techniques used to optimize failure recovery time. Afterward, we describe the design of a vFFR strategy that combines three data plane recovery algorithms to reduce latency and packet loss under varying network conditions. Our vFFR strategy has been modeled in P4 language and tested on an emulation platform to validate the three data plane recovery algorithms under different conditions. The results show that latency varies according to the alternate path selected in the recovery algorithm, and the packet loss rate remains constant even when the background traffic reaches 90% of the link capacity. In addition, the vFFR strategy has been implemented on Intel Tofino devices, achieving a failure recovery time lower than
$500~\mu s$
and a total frame loss rate below 0.005% in all cases, including those with a 35 Gbps load.
期刊介绍:
The IEEE Open Journal of the Communications Society (OJ-COMS) is an open access, all-electronic journal that publishes original high-quality manuscripts on advances in the state of the art of telecommunications systems and networks. The papers in IEEE OJ-COMS are included in Scopus. Submissions reporting new theoretical findings (including novel methods, concepts, and studies) and practical contributions (including experiments and development of prototypes) are welcome. Additionally, survey and tutorial articles are considered. The IEEE OJCOMS received its debut impact factor of 7.9 according to the Journal Citation Reports (JCR) 2023.
The IEEE Open Journal of the Communications Society covers science, technology, applications and standards for information organization, collection and transfer using electronic, optical and wireless channels and networks. Some specific areas covered include:
Systems and network architecture, control and management
Protocols, software, and middleware
Quality of service, reliability, and security
Modulation, detection, coding, and signaling
Switching and routing
Mobile and portable communications
Terminals and other end-user devices
Networks for content distribution and distributed computing
Communications-based distributed resources control.