{"title":"Range-efficient computation of F/sub 0/ over massive data streams","authors":"A. Pavan, S. Tirthapura","doi":"10.1109/ICDE.2005.118","DOIUrl":null,"url":null,"abstract":"Efficient one-pass computation of F/sub 0/, the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider the problem of efficiently estimating F/sub 0/ of a data stream where each element of the stream is an interval of integers. We present a randomized algorithm which gives an (/spl epsiv/, /spl delta/) approximation of F/sub 0/, with the following time complexity (n is the size of the universe of the items): (1) the amortized processing time per interval is O(log1//spl delta/ log n//spl epsiv/). (2) The time to answer a query for F/sub 0/ is O(log1//spl delta/). The workspace used is O(1//spl epsiv//sup 2/log1//spl delta/logn) bits. Our algorithm improves upon a previous algorithm by Bar-Yossef Kumar and Sivakumar (2002), which requires O(1//spl epsiv//sup 5/log1//spl delta/log/sup 5/n) processing time per item. Our algorithm can be used to compute the max-dominance norm of a stream of multiple signals, and significantly improves upon the current best bounds due to Cormode and Muthukrishnan (2003). This also provides efficient and novel solutions for data aggregation problems in sensor networks studied by Nath and Gibbons (2004) and Considine et. al. (2004).","PeriodicalId":297231,"journal":{"name":"21st International Conference on Data Engineering (ICDE'05)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2005-04-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"21st International Conference on Data Engineering (ICDE'05)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2005.118","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19
Abstract
Efficient one-pass computation of F/sub 0/, the number of distinct elements in a data stream, is a fundamental problem arising in various contexts in databases and networking. We consider the problem of efficiently estimating F/sub 0/ of a data stream where each element of the stream is an interval of integers. We present a randomized algorithm which gives an (/spl epsiv/, /spl delta/) approximation of F/sub 0/, with the following time complexity (n is the size of the universe of the items): (1) the amortized processing time per interval is O(log1//spl delta/ log n//spl epsiv/). (2) The time to answer a query for F/sub 0/ is O(log1//spl delta/). The workspace used is O(1//spl epsiv//sup 2/log1//spl delta/logn) bits. Our algorithm improves upon a previous algorithm by Bar-Yossef Kumar and Sivakumar (2002), which requires O(1//spl epsiv//sup 5/log1//spl delta/log/sup 5/n) processing time per item. Our algorithm can be used to compute the max-dominance norm of a stream of multiple signals, and significantly improves upon the current best bounds due to Cormode and Muthukrishnan (2003). This also provides efficient and novel solutions for data aggregation problems in sensor networks studied by Nath and Gibbons (2004) and Considine et. al. (2004).