大型数据中心网络中的数据包级遥测

Yibo Zhu, Nanxi Kang, Jiaxin Cao, A. Greenberg, Guohan Lu, Ratul Mahajan, D. Maltz, Lihua Yuan, Ming Zhang, Ben Y. Zhao, Haitao Zheng
{"title":"大型数据中心网络中的数据包级遥测","authors":"Yibo Zhu, Nanxi Kang, Jiaxin Cao, A. Greenberg, Guohan Lu, Ratul Mahajan, D. Maltz, Lihua Yuan, Ming Zhang, Ben Y. Zhao, Haitao Zheng","doi":"10.1145/2785956.2787483","DOIUrl":null,"url":null,"abstract":"Debugging faults in complex networks often requires capturing and analyzing traffic at the packet level. In this task, datacenter networks (DCNs) present unique challenges with their scale, traffic volume, and diversity of faults. To troubleshoot faults in a timely manner, DCN administrators must a) identify affected packets inside large volume of traffic; b) track them across multiple network components; c) analyze traffic traces for fault patterns; and d) test or confirm potential causes. To our knowledge, no tool today can achieve both the specificity and scale required for this task. We present Everflow, a packet-level network telemetry system for large DCNs. Everflow traces specific packets by implementing a powerful packet filter on top of \"match and mirror\" functionality of commodity switches. It shuffles captured packets to multiple analysis servers using load balancers built on switch ASICs, and it sends \"guided probes\" to test or confirm potential faults. We present experiments that demonstrate Everflow's scalability, and share experiences of troubleshooting network faults gathered from running it for over 6 months in Microsoft's DCNs.","PeriodicalId":268472,"journal":{"name":"Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"287","resultStr":"{\"title\":\"Packet-Level Telemetry in Large Datacenter Networks\",\"authors\":\"Yibo Zhu, Nanxi Kang, Jiaxin Cao, A. Greenberg, Guohan Lu, Ratul Mahajan, D. Maltz, Lihua Yuan, Ming Zhang, Ben Y. Zhao, Haitao Zheng\",\"doi\":\"10.1145/2785956.2787483\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Debugging faults in complex networks often requires capturing and analyzing traffic at the packet level. In this task, datacenter networks (DCNs) present unique challenges with their scale, traffic volume, and diversity of faults. To troubleshoot faults in a timely manner, DCN administrators must a) identify affected packets inside large volume of traffic; b) track them across multiple network components; c) analyze traffic traces for fault patterns; and d) test or confirm potential causes. To our knowledge, no tool today can achieve both the specificity and scale required for this task. We present Everflow, a packet-level network telemetry system for large DCNs. Everflow traces specific packets by implementing a powerful packet filter on top of \\\"match and mirror\\\" functionality of commodity switches. It shuffles captured packets to multiple analysis servers using load balancers built on switch ASICs, and it sends \\\"guided probes\\\" to test or confirm potential faults. We present experiments that demonstrate Everflow's scalability, and share experiences of troubleshooting network faults gathered from running it for over 6 months in Microsoft's DCNs.\",\"PeriodicalId\":268472,\"journal\":{\"name\":\"Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"287\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2785956.2787483\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2785956.2787483","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 287

摘要

在复杂网络中,调试故障往往需要在报文级进行流量捕获和分析。在这项任务中,数据中心网络(dcn)以其规模、流量和故障的多样性提出了独特的挑战。为了及时排除故障,DCN管理员必须a)在大流量中识别受影响的数据包;B)跨多个网络组件跟踪它们;C)分析故障模式的流量轨迹;d)测试或确认潜在的原因。据我们所知,目前还没有工具可以同时满足这项任务所需的特异性和规模。我们提出了Everflow,一个用于大型DCNs的数据包级网络遥测系统。Everflow通过在商品开关的“匹配和镜像”功能之上实现强大的包过滤器来跟踪特定的数据包。它使用建立在交换机asic上的负载平衡器将捕获的数据包转移到多个分析服务器,并发送“引导探针”来测试或确认潜在的故障。我们展示了演示Everflow的可扩展性的实验,并分享了在微软dcn中运行它超过6个月所收集的排除网络故障的经验。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Packet-Level Telemetry in Large Datacenter Networks
Debugging faults in complex networks often requires capturing and analyzing traffic at the packet level. In this task, datacenter networks (DCNs) present unique challenges with their scale, traffic volume, and diversity of faults. To troubleshoot faults in a timely manner, DCN administrators must a) identify affected packets inside large volume of traffic; b) track them across multiple network components; c) analyze traffic traces for fault patterns; and d) test or confirm potential causes. To our knowledge, no tool today can achieve both the specificity and scale required for this task. We present Everflow, a packet-level network telemetry system for large DCNs. Everflow traces specific packets by implementing a powerful packet filter on top of "match and mirror" functionality of commodity switches. It shuffles captured packets to multiple analysis servers using load balancers built on switch ASICs, and it sends "guided probes" to test or confirm potential faults. We present experiments that demonstrate Everflow's scalability, and share experiences of troubleshooting network faults gathered from running it for over 6 months in Microsoft's DCNs.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信