利用小内存资源估计基数排名前N的主机

22nd International Conference on Data Engineering Workshops (ICDEW'06) Pub Date : 2006-04-03 DOI:10.1109/ICDEW.2006.56

K. Ishibashi, Tatsuya Mori, R. Kawahara, Yutaka Hirokawa, A. Kobayashi, K. Yamamoto, H. Sakamoto

{"title":"利用小内存资源估计基数排名前N的主机","authors":"K. Ishibashi, Tatsuya Mori, R. Kawahara, Yutaka Hirokawa, A. Kobayashi, K. Yamamoto, H. Sakamoto","doi":"10.1109/ICDEW.2006.56","DOIUrl":null,"url":null,"abstract":"We propose a method to find N hosts that have the N highest cardinalities, where cardinality is the number of distinct items such as the number of flows, ports, or peer hosts. The method also estimates their cardinalities. While existing algorithms to find the top N frequent items can be directly applied to find N hosts that send the N largest numbers of packets through packet data stream, finding hosts that have the N highest cardinalities requires tables of previously seen items for each host to check whether an item of an arrival packet is new, which requires a lot of memory. Even if we use the existing cardinality estimation methods, we still need to have cardinality information about each host. In this paper, we use the property of cardinality estimation, in which the cardinality of intersections of multiple data sets can be estimated with cardinality information of each data set. Using the property, we propose an algorithm that does not need to maintain tables for each host, but only for partitioned addresses of a host and estimate the cardinality of a host as the intersection of cardinalities of partitioned addresses. We also propose a method to find top N hosts in cardinalities which is to be monitored to detect anomalous behavior in networks. We evaluate our algorithm through actual backbone traffic data. While the estimation accuracy of our scheme degrades for small cardinalities, as for the top 100 hosts, the accuracy of our algorithm with 4, 096 tables is almost the same as having tables of every hosts.","PeriodicalId":331953,"journal":{"name":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"Estimating Top N Hosts in Cardinality Using Small Memory Resources\",\"authors\":\"K. Ishibashi, Tatsuya Mori, R. Kawahara, Yutaka Hirokawa, A. Kobayashi, K. Yamamoto, H. Sakamoto\",\"doi\":\"10.1109/ICDEW.2006.56\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose a method to find N hosts that have the N highest cardinalities, where cardinality is the number of distinct items such as the number of flows, ports, or peer hosts. The method also estimates their cardinalities. While existing algorithms to find the top N frequent items can be directly applied to find N hosts that send the N largest numbers of packets through packet data stream, finding hosts that have the N highest cardinalities requires tables of previously seen items for each host to check whether an item of an arrival packet is new, which requires a lot of memory. Even if we use the existing cardinality estimation methods, we still need to have cardinality information about each host. In this paper, we use the property of cardinality estimation, in which the cardinality of intersections of multiple data sets can be estimated with cardinality information of each data set. Using the property, we propose an algorithm that does not need to maintain tables for each host, but only for partitioned addresses of a host and estimate the cardinality of a host as the intersection of cardinalities of partitioned addresses. We also propose a method to find top N hosts in cardinalities which is to be monitored to detect anomalous behavior in networks. We evaluate our algorithm through actual backbone traffic data. While the estimation accuracy of our scheme degrades for small cardinalities, as for the top 100 hosts, the accuracy of our algorithm with 4, 096 tables is almost the same as having tables of every hosts.\",\"PeriodicalId\":331953,\"journal\":{\"name\":\"22nd International Conference on Data Engineering Workshops (ICDEW'06)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-04-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"22nd International Conference on Data Engineering Workshops (ICDEW'06)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDEW.2006.56\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"22nd International Conference on Data Engineering Workshops (ICDEW'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDEW.2006.56","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 6

摘要

我们提出了一种方法来查找具有N个最高基数的N个主机，其中基数是不同项目的数量，例如流，端口或对等主机的数量。该方法还估计它们的基数。虽然查找前N个频繁项的现有算法可以直接应用于查找通过数据包数据流发送N个最大数据包的N个主机，但查找具有N个最高基数的主机需要为每个主机提供以前见过的项的表，以检查到达数据包的项是否为新项，这需要大量内存。即使我们使用现有的基数估计方法，我们仍然需要有关于每个主机的基数信息。本文利用了基数估计的性质，利用每个数据集的基数信息来估计多个数据集相交的基数。利用该属性，我们提出了一种算法，该算法不需要为每个主机维护表，而只需要为主机的分区地址维护表，并将主机的基数估计为分区地址基数的交集。我们还提出了一种在基数中找到前N个主机的方法，该方法将被监控以检测网络中的异常行为。我们通过实际的骨干流量数据来评估我们的算法。虽然对于较小的基数，我们的方案的估计精度会降低，但对于前100个主机，我们的算法使用4,096个表的准确性几乎与使用每个主机的表相同。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Estimating Top N Hosts in Cardinality Using Small Memory Resources

We propose a method to find N hosts that have the N highest cardinalities, where cardinality is the number of distinct items such as the number of flows, ports, or peer hosts. The method also estimates their cardinalities. While existing algorithms to find the top N frequent items can be directly applied to find N hosts that send the N largest numbers of packets through packet data stream, finding hosts that have the N highest cardinalities requires tables of previously seen items for each host to check whether an item of an arrival packet is new, which requires a lot of memory. Even if we use the existing cardinality estimation methods, we still need to have cardinality information about each host. In this paper, we use the property of cardinality estimation, in which the cardinality of intersections of multiple data sets can be estimated with cardinality information of each data set. Using the property, we propose an algorithm that does not need to maintain tables for each host, but only for partitioned addresses of a host and estimate the cardinality of a host as the intersection of cardinalities of partitioned addresses. We also propose a method to find top N hosts in cardinalities which is to be monitored to detect anomalous behavior in networks. We evaluate our algorithm through actual backbone traffic data. While the estimation accuracy of our scheme degrades for small cardinalities, as for the top 100 hosts, the accuracy of our algorithm with 4, 096 tables is almost the same as having tables of every hosts.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

22nd International Conference on Data Engineering Workshops (ICDEW'06)

自引率

0.00%

发文量