草图引导抽样-使用在线估计流量大小自适应数据收集

Abhishek Kumar, Jun Xu
{"title":"草图引导抽样-使用在线估计流量大小自适应数据收集","authors":"Abhishek Kumar, Jun Xu","doi":"10.1109/INFOCOM.2006.326","DOIUrl":null,"url":null,"abstract":"Monitoring the traffic in high-speed networks is a data intensive problem. Uniform packet sampling is the most popular technique for reducing the amount of data the network monitoring hardware/software has to process. However, uniform sampling captures far less information than can be potentially obtained with the same overall sampling rate. This is because uni- form sampling (unnecessarily) draws the vast majority of samples from large flows, and very few from small and medium flows. This information loss on small and medium flows significantly affects the accuracy of the estimation of various network statistics. In this work, we develop a new packet sampling methodol- ogy called \"sketch-guided sampling\" (SGS), which offers better statistics than obtainable from uniform sampling, given the same number of raw samples gathered. Its main idea is to make the probability with which an incoming packet is sampled a decreasing sampling function f of the size of the flow the packet belongs to. This way our scheme is able to significantly increase the packet sampling rate of the small and medium flows at slight expense of the large flows, resulting in much more accurate estimations of various network statistics. However, the exact sizes of all flows are available only if we keep per-flow information for every flow, which is prohibitively expensive for high-speed links. Our SGS scheme solves this problem by using a small (lossy) synopsis data structure called counting sketch to encode the approximate sizes of all flows. Our evaluation on real-world Internet traffic traces shows that our sampling theory based the approximate flow size estimates from the counting sketch works almost as well as if we know the exact sizes of the flows.","PeriodicalId":163725,"journal":{"name":"Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications","volume":"93 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"112","resultStr":"{\"title\":\"Sketch Guided Sampling - Using On-Line Estimates of Flow Size for Adaptive Data Collection\",\"authors\":\"Abhishek Kumar, Jun Xu\",\"doi\":\"10.1109/INFOCOM.2006.326\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Monitoring the traffic in high-speed networks is a data intensive problem. Uniform packet sampling is the most popular technique for reducing the amount of data the network monitoring hardware/software has to process. However, uniform sampling captures far less information than can be potentially obtained with the same overall sampling rate. This is because uni- form sampling (unnecessarily) draws the vast majority of samples from large flows, and very few from small and medium flows. This information loss on small and medium flows significantly affects the accuracy of the estimation of various network statistics. In this work, we develop a new packet sampling methodol- ogy called \\\"sketch-guided sampling\\\" (SGS), which offers better statistics than obtainable from uniform sampling, given the same number of raw samples gathered. Its main idea is to make the probability with which an incoming packet is sampled a decreasing sampling function f of the size of the flow the packet belongs to. This way our scheme is able to significantly increase the packet sampling rate of the small and medium flows at slight expense of the large flows, resulting in much more accurate estimations of various network statistics. However, the exact sizes of all flows are available only if we keep per-flow information for every flow, which is prohibitively expensive for high-speed links. Our SGS scheme solves this problem by using a small (lossy) synopsis data structure called counting sketch to encode the approximate sizes of all flows. Our evaluation on real-world Internet traffic traces shows that our sampling theory based the approximate flow size estimates from the counting sketch works almost as well as if we know the exact sizes of the flows.\",\"PeriodicalId\":163725,\"journal\":{\"name\":\"Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications\",\"volume\":\"93 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"112\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/INFOCOM.2006.326\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings IEEE INFOCOM 2006. 25TH IEEE International Conference on Computer Communications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/INFOCOM.2006.326","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 112

摘要

高速网络中的流量监控是一个数据密集型问题。统一数据包采样是减少网络监控硬件/软件必须处理的数据量的最流行的技术。然而,均匀采样所捕获的信息远远少于相同总体采样率所能获得的信息。这是因为均匀采样(不必要地)从大流量中提取绝大多数样本,而从中小型流量中提取的样本很少。中小型流的这种信息丢失严重影响了各种网络统计估计的准确性。在这项工作中,我们开发了一种新的分组抽样方法,称为“草图引导抽样”(SGS),在收集相同数量的原始样本的情况下,它提供了比统一抽样更好的统计数据。其主要思想是使传入数据包被采样的概率成为数据包所属流大小的递减采样函数f。通过这种方式,我们的方案能够显著提高中小型流的数据包采样率,而对大型流的采样率影响很小,从而对各种网络统计数据进行更准确的估计。然而,只有当我们为每个流保留每个流的信息时,才能获得所有流的确切大小,这对于高速链接来说是非常昂贵的。我们的SGS方案通过使用称为计数草图的小(有损)概要数据结构来编码所有流的大致大小来解决这个问题。我们对真实世界互联网流量轨迹的评估表明,我们基于计数草图估计的近似流量大小的抽样理论几乎与我们知道流量的确切大小一样有效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Sketch Guided Sampling - Using On-Line Estimates of Flow Size for Adaptive Data Collection
Monitoring the traffic in high-speed networks is a data intensive problem. Uniform packet sampling is the most popular technique for reducing the amount of data the network monitoring hardware/software has to process. However, uniform sampling captures far less information than can be potentially obtained with the same overall sampling rate. This is because uni- form sampling (unnecessarily) draws the vast majority of samples from large flows, and very few from small and medium flows. This information loss on small and medium flows significantly affects the accuracy of the estimation of various network statistics. In this work, we develop a new packet sampling methodol- ogy called "sketch-guided sampling" (SGS), which offers better statistics than obtainable from uniform sampling, given the same number of raw samples gathered. Its main idea is to make the probability with which an incoming packet is sampled a decreasing sampling function f of the size of the flow the packet belongs to. This way our scheme is able to significantly increase the packet sampling rate of the small and medium flows at slight expense of the large flows, resulting in much more accurate estimations of various network statistics. However, the exact sizes of all flows are available only if we keep per-flow information for every flow, which is prohibitively expensive for high-speed links. Our SGS scheme solves this problem by using a small (lossy) synopsis data structure called counting sketch to encode the approximate sizes of all flows. Our evaluation on real-world Internet traffic traces shows that our sampling theory based the approximate flow size estimates from the counting sketch works almost as well as if we know the exact sizes of the flows.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信