Approximately Processing Multi-granularity Aggregate Queries over Data Streams

22nd International Conference on Data Engineering (ICDE'06) Pub Date : 2006-04-03 DOI:10.1109/ICDE.2006.22

Shouke Qin, Weining Qian, Aoying Zhou

{"title":"Approximately Processing Multi-granularity Aggregate Queries over Data Streams","authors":"Shouke Qin, Weining Qian, Aoying Zhou","doi":"10.1109/ICDE.2006.22","DOIUrl":null,"url":null,"abstract":"Aggregate monitoring over data streams is attracting more and more attention in research community due to its broad potential applications. Existing methods suffer two problems, 1) The aggregate functions which could be monitored are restricted to be first-order statistic or monotonic with respect to the window size. 2) Only a limited number of granularity and time scales could be monitored over a stream, thus some interesting patterns might be neglected, and users might be misled by the incomplete changing profile about current data streams. These two impede the development of online mining techniques over data streams, and some kind of breakthrough is urged. In this paper, we employed the powerful tool of fractal analysis to enable the monitoring of both monotonic and non-monotonic aggregates on time-changing data streams. The monotony property of aggregate monitoring is revealed and monotonic search space is built to decrease the time overhead for accessing the synopsis from O(m) to O(logm), where m is the number of windows to be monitored. With the help of a novel inverted histogram, the statistical summary is compressed to be fit in limited main memory, so that high aggregates on windows of any length can be detected accurately and efficiently on-line. Theoretical analysis show the space and time complexity bound of this method are relatively low, while experimental results prove the applicability and efficiency of the proposed algorithm in different application settings.","PeriodicalId":6819,"journal":{"name":"22nd International Conference on Data Engineering (ICDE'06)","volume":"46 1","pages":"67-67"},"PeriodicalIF":0.0000,"publicationDate":"2006-04-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"22","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"22nd International Conference on Data Engineering (ICDE'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2006.22","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 22

Abstract

Aggregate monitoring over data streams is attracting more and more attention in research community due to its broad potential applications. Existing methods suffer two problems, 1) The aggregate functions which could be monitored are restricted to be first-order statistic or monotonic with respect to the window size. 2) Only a limited number of granularity and time scales could be monitored over a stream, thus some interesting patterns might be neglected, and users might be misled by the incomplete changing profile about current data streams. These two impede the development of online mining techniques over data streams, and some kind of breakthrough is urged. In this paper, we employed the powerful tool of fractal analysis to enable the monitoring of both monotonic and non-monotonic aggregates on time-changing data streams. The monotony property of aggregate monitoring is revealed and monotonic search space is built to decrease the time overhead for accessing the synopsis from O(m) to O(logm), where m is the number of windows to be monitored. With the help of a novel inverted histogram, the statistical summary is compressed to be fit in limited main memory, so that high aggregates on windows of any length can be detected accurately and efficiently on-line. Theoretical analysis show the space and time complexity bound of this method are relatively low, while experimental results prove the applicability and efficiency of the proposed algorithm in different application settings.

查看原文本刊更多论文

近似处理数据流上的多粒度聚合查询

数据流的聚合监控由于其广泛的应用前景而越来越受到学术界的关注。现有的方法存在两个问题:1)可监测的聚合函数对于窗口大小来说是一阶统计量或单调的。2)只有有限的粒度和时间尺度可以监控流，因此一些有趣的模式可能会被忽略，用户可能会被当前数据流的不完整的变化概况误导。这两者阻碍了数据流在线挖掘技术的发展，迫切需要某种突破。本文利用分形分析这一强大的工具，实现了对随时间变化的数据流的单调和非单调聚合体的监测。揭示了聚合监控的单调性，建立了单调搜索空间，以减少从O(m)到O(logm)访问概要的时间开销，其中m为要监控的窗口数。利用一种新颖的倒直方图对统计摘要进行压缩，使其适合有限的主存储器，从而可以准确有效地在线检测任意长度窗口上的高聚合。理论分析表明，该方法具有较低的空间和时间复杂度界限，实验结果证明了该算法在不同应用环境下的适用性和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

22nd International Conference on Data Engineering (ICDE'06)

自引率

0.00%

发文量