{"title":"草图引导抽样-使用在线估计流量大小自适应数据收集","authors":"Abhishek Kumar, Jun Xu","doi":"10.1109/INFOCOM.2006.326","DOIUrl":null,"url":null,"abstract":"Monitoring the traffic in high-speed networks is a data intensive problem. Uniform packet sampling is the most popular technique for reducing the amount of data the network monitoring hardware/software has to process. However, uniform sampling captures far less information than can be potentially obtained with the same overall sampling rate. This is because uni- form sampling (unnecessarily) draws the vast majority of samples from large flows, and very few from small and medium flows. This information loss on small and medium flows significantly affects the accuracy of the estimation of various network statistics. In this work, we develop a new packet sampling methodol- ogy called \"sketch-guided sampling\" (SGS), which offers better statistics than obtainable from uniform sampling, given the same number of raw samples gathered. Its main idea is to make the probability with which an incoming packet is sampled a decreasing sampling function f of the size of the flow the packet belongs to. This way our scheme is able to significantly increase the packet sampling rate of the small and medium flows at slight expense of the large flows, resulting in much more accurate estimations of various network statistics. However, the exact sizes of all flows are available only if we keep per-flow information for every flow, which is prohibitively expensive for high-speed links. Our SGS scheme solves this problem by using a small (lossy) synopsis data structure called counting sketch to encode the approximate sizes of all flows. Our evaluation on real-world Internet traffic traces shows that our sampling theory based the approximate flow size estimates from the counting sketch works almost as well as if we know the exact sizes of the flows.","PeriodicalId":163725,"journal":{"name":"Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications","volume":"93 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"112","resultStr":"{\"title\":\"Sketch Guided Sampling - Using On-Line Estimates of Flow Size for Adaptive Data Collection\",\"authors\":\"Abhishek Kumar, Jun Xu\",\"doi\":\"10.1109/INFOCOM.2006.326\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Monitoring the traffic in high-speed networks is a data intensive problem. Uniform packet sampling is the most popular technique for reducing the amount of data the network monitoring hardware/software has to process. However, uniform sampling captures far less information than can be potentially obtained with the same overall sampling rate. This is because uni- form sampling (unnecessarily) draws the vast majority of samples from large flows, and very few from small and medium flows. This information loss on small and medium flows significantly affects the accuracy of the estimation of various network statistics. In this work, we develop a new packet sampling methodol- ogy called \\\"sketch-guided sampling\\\" (SGS), which offers better statistics than obtainable from uniform sampling, given the same number of raw samples gathered. Its main idea is to make the probability with which an incoming packet is sampled a decreasing sampling function f of the size of the flow the packet belongs to. This way our scheme is able to significantly increase the packet sampling rate of the small and medium flows at slight expense of the large flows, resulting in much more accurate estimations of various network statistics. However, the exact sizes of all flows are available only if we keep per-flow information for every flow, which is prohibitively expensive for high-speed links. Our SGS scheme solves this problem by using a small (lossy) synopsis data structure called counting sketch to encode the approximate sizes of all flows. Our evaluation on real-world Internet traffic traces shows that our sampling theory based the approximate flow size estimates from the counting sketch works almost as well as if we know the exact sizes of the flows.\",\"PeriodicalId\":163725,\"journal\":{\"name\":\"Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications\",\"volume\":\"93 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"112\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INFOCOM.2006.326\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM.2006.326","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Sketch Guided Sampling - Using On-Line Estimates of Flow Size for Adaptive Data Collection
Monitoring the traffic in high-speed networks is a data intensive problem. Uniform packet sampling is the most popular technique for reducing the amount of data the network monitoring hardware/software has to process. However, uniform sampling captures far less information than can be potentially obtained with the same overall sampling rate. This is because uni- form sampling (unnecessarily) draws the vast majority of samples from large flows, and very few from small and medium flows. This information loss on small and medium flows significantly affects the accuracy of the estimation of various network statistics. In this work, we develop a new packet sampling methodol- ogy called "sketch-guided sampling" (SGS), which offers better statistics than obtainable from uniform sampling, given the same number of raw samples gathered. Its main idea is to make the probability with which an incoming packet is sampled a decreasing sampling function f of the size of the flow the packet belongs to. This way our scheme is able to significantly increase the packet sampling rate of the small and medium flows at slight expense of the large flows, resulting in much more accurate estimations of various network statistics. However, the exact sizes of all flows are available only if we keep per-flow information for every flow, which is prohibitively expensive for high-speed links. Our SGS scheme solves this problem by using a small (lossy) synopsis data structure called counting sketch to encode the approximate sizes of all flows. Our evaluation on real-world Internet traffic traces shows that our sampling theory based the approximate flow size estimates from the counting sketch works almost as well as if we know the exact sizes of the flows.