SQR:高可靠数据中心网络中链路故障的网络内丢包恢复

Ting Qu, Raj Joshi, M. Chan, B. Leong, Deke Guo, Zhong Liu
{"title":"SQR:高可靠数据中心网络中链路故障的网络内丢包恢复","authors":"Ting Qu, Raj Joshi, M. Chan, B. Leong, Deke Guo, Zhong Liu","doi":"10.1109/ICNP.2019.8888055","DOIUrl":null,"url":null,"abstract":"In datacenter networks, flows need to complete as quickly as possible because the flow completion time (FCT) directly impacts user experience, and thus revenue. Link failures can have a significant impact on short latency-sensitive flows because they increase their FCTs by several fold. Existing link failure management techniques cannot keep the FCTs low under link failures because they cannot completely eliminate packet loss during such failures. We observe that to completely mask the effect of packet loss and the resulting long recovery latency, the network has to be responsible for packet loss recovery instead of relying on end-to-end recovery. To this end, we propose Shared Queue Ring (SQR), an on-switch mechanism that completely eliminates packet loss during link failures by diverting the affected flows seamlessly to alternative paths. We implemented SQR on a Barefoot Tofino switch using the P4 programming language. Our evaluation on a hardware testbed shows that SQR can completely mask link failures and reduce tail FCT by up to 4 orders of magnitude for latency-sensitive workloads.","PeriodicalId":385397,"journal":{"name":"2019 IEEE 27th International Conference on Network Protocols (ICNP)","volume":"62 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"SQR: In-network Packet Loss Recovery from Link Failures for Highly Reliable Datacenter Networks\",\"authors\":\"Ting Qu, Raj Joshi, M. Chan, B. Leong, Deke Guo, Zhong Liu\",\"doi\":\"10.1109/ICNP.2019.8888055\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In datacenter networks, flows need to complete as quickly as possible because the flow completion time (FCT) directly impacts user experience, and thus revenue. Link failures can have a significant impact on short latency-sensitive flows because they increase their FCTs by several fold. Existing link failure management techniques cannot keep the FCTs low under link failures because they cannot completely eliminate packet loss during such failures. We observe that to completely mask the effect of packet loss and the resulting long recovery latency, the network has to be responsible for packet loss recovery instead of relying on end-to-end recovery. To this end, we propose Shared Queue Ring (SQR), an on-switch mechanism that completely eliminates packet loss during link failures by diverting the affected flows seamlessly to alternative paths. We implemented SQR on a Barefoot Tofino switch using the P4 programming language. Our evaluation on a hardware testbed shows that SQR can completely mask link failures and reduce tail FCT by up to 4 orders of magnitude for latency-sensitive workloads.\",\"PeriodicalId\":385397,\"journal\":{\"name\":\"2019 IEEE 27th International Conference on Network Protocols (ICNP)\",\"volume\":\"62 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 27th International Conference on Network Protocols (ICNP)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICNP.2019.8888055\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 27th International Conference on Network Protocols (ICNP)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICNP.2019.8888055","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 24

摘要

在数据中心网络中,流程需要尽可能快地完成,因为流程完成时间(FCT)直接影响用户体验,从而影响收入。链路故障会对短延迟敏感的流产生重大影响,因为它们会使fct增加数倍。现有的链路故障管理技术由于不能完全消除链路故障时的丢包现象,无法使链路故障时的fct保持在较低的水平。我们观察到,为了完全掩盖丢包的影响和由此产生的长恢复延迟,网络必须负责丢包恢复,而不是依赖于端到端恢复。为此,我们提出了共享队列环(SQR),这是一种开关机制,通过无缝地将受影响的流转移到替代路径,完全消除了链路故障期间的数据包丢失。我们使用P4编程语言在赤脚Tofino开关上实现SQR。我们在硬件测试平台上的评估表明,对于延迟敏感的工作负载,SQR可以完全掩盖链路故障,并将尾部FCT降低多达4个数量级。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SQR: In-network Packet Loss Recovery from Link Failures for Highly Reliable Datacenter Networks
In datacenter networks, flows need to complete as quickly as possible because the flow completion time (FCT) directly impacts user experience, and thus revenue. Link failures can have a significant impact on short latency-sensitive flows because they increase their FCTs by several fold. Existing link failure management techniques cannot keep the FCTs low under link failures because they cannot completely eliminate packet loss during such failures. We observe that to completely mask the effect of packet loss and the resulting long recovery latency, the network has to be responsible for packet loss recovery instead of relying on end-to-end recovery. To this end, we propose Shared Queue Ring (SQR), an on-switch mechanism that completely eliminates packet loss during link failures by diverting the affected flows seamlessly to alternative paths. We implemented SQR on a Barefoot Tofino switch using the P4 programming language. Our evaluation on a hardware testbed shows that SQR can completely mask link failures and reduce tail FCT by up to 4 orders of magnitude for latency-sensitive workloads.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信