Yibo Zhu, Nanxi Kang, Jiaxin Cao, A. Greenberg, Guohan Lu, Ratul Mahajan, D. Maltz, Lihua Yuan, Ming Zhang, Ben Y. Zhao, Haitao Zheng
{"title":"大型数据中心网络中的数据包级遥测","authors":"Yibo Zhu, Nanxi Kang, Jiaxin Cao, A. Greenberg, Guohan Lu, Ratul Mahajan, D. Maltz, Lihua Yuan, Ming Zhang, Ben Y. Zhao, Haitao Zheng","doi":"10.1145/2785956.2787483","DOIUrl":null,"url":null,"abstract":"Debugging faults in complex networks often requires capturing and analyzing traffic at the packet level. In this task, datacenter networks (DCNs) present unique challenges with their scale, traffic volume, and diversity of faults. To troubleshoot faults in a timely manner, DCN administrators must a) identify affected packets inside large volume of traffic; b) track them across multiple network components; c) analyze traffic traces for fault patterns; and d) test or confirm potential causes. To our knowledge, no tool today can achieve both the specificity and scale required for this task. We present Everflow, a packet-level network telemetry system for large DCNs. Everflow traces specific packets by implementing a powerful packet filter on top of \"match and mirror\" functionality of commodity switches. It shuffles captured packets to multiple analysis servers using load balancers built on switch ASICs, and it sends \"guided probes\" to test or confirm potential faults. We present experiments that demonstrate Everflow's scalability, and share experiences of troubleshooting network faults gathered from running it for over 6 months in Microsoft's DCNs.","PeriodicalId":268472,"journal":{"name":"Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"287","resultStr":"{\"title\":\"Packet-Level Telemetry in Large Datacenter Networks\",\"authors\":\"Yibo Zhu, Nanxi Kang, Jiaxin Cao, A. Greenberg, Guohan Lu, Ratul Mahajan, D. Maltz, Lihua Yuan, Ming Zhang, Ben Y. Zhao, Haitao Zheng\",\"doi\":\"10.1145/2785956.2787483\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Debugging faults in complex networks often requires capturing and analyzing traffic at the packet level. In this task, datacenter networks (DCNs) present unique challenges with their scale, traffic volume, and diversity of faults. To troubleshoot faults in a timely manner, DCN administrators must a) identify affected packets inside large volume of traffic; b) track them across multiple network components; c) analyze traffic traces for fault patterns; and d) test or confirm potential causes. To our knowledge, no tool today can achieve both the specificity and scale required for this task. We present Everflow, a packet-level network telemetry system for large DCNs. Everflow traces specific packets by implementing a powerful packet filter on top of \\\"match and mirror\\\" functionality of commodity switches. It shuffles captured packets to multiple analysis servers using load balancers built on switch ASICs, and it sends \\\"guided probes\\\" to test or confirm potential faults. We present experiments that demonstrate Everflow's scalability, and share experiences of troubleshooting network faults gathered from running it for over 6 months in Microsoft's DCNs.\",\"PeriodicalId\":268472,\"journal\":{\"name\":\"Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"287\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2785956.2787483\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 ACM Conference on Special Interest Group on Data Communication","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2785956.2787483","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Packet-Level Telemetry in Large Datacenter Networks
Debugging faults in complex networks often requires capturing and analyzing traffic at the packet level. In this task, datacenter networks (DCNs) present unique challenges with their scale, traffic volume, and diversity of faults. To troubleshoot faults in a timely manner, DCN administrators must a) identify affected packets inside large volume of traffic; b) track them across multiple network components; c) analyze traffic traces for fault patterns; and d) test or confirm potential causes. To our knowledge, no tool today can achieve both the specificity and scale required for this task. We present Everflow, a packet-level network telemetry system for large DCNs. Everflow traces specific packets by implementing a powerful packet filter on top of "match and mirror" functionality of commodity switches. It shuffles captured packets to multiple analysis servers using load balancers built on switch ASICs, and it sends "guided probes" to test or confirm potential faults. We present experiments that demonstrate Everflow's scalability, and share experiences of troubleshooting network faults gathered from running it for over 6 months in Microsoft's DCNs.