Duality-based subsequence matching in time-series databases

Proceedings 17th International Conference on Data Engineering Pub Date : 2001-04-02 DOI:10.1109/ICDE.2001.914837

Yang-Sae Moon, K. Whang, W. Loh

{"title":"Duality-based subsequence matching in time-series databases","authors":"Yang-Sae Moon, K. Whang, W. Loh","doi":"10.1109/ICDE.2001.914837","DOIUrl":null,"url":null,"abstract":"The authors propose a subsequence matching method, Dual Match, which exploits duality in constructing windows and significantly improves performance. Dual Match divides data sequences into disjoint windows and the query sequence into sliding windows, and thus, is a dual approach of the one by C. Faloutsos et al. (1994), which divides data sequences into sliding windows and the query sequence into disjoint windows. We formally prove that our dual approach is correct, i.e., it incurs no false dismissal. We also prove that, given the minimum query length, there is a maximum bound of the window size to guarantee correctness of Dual Match and discuss the effect of the window size on performance. FRM causes a lot of false alarms by storing minimum bounding rectangles rather than individual points representing windows to avoid excessive storage space required for the index. Dual Match solves this problem by directly storing points, but without incurring excessive storage overhead. Experimental results show that, in most cases, Dual Match provides large improvement in both false alarms and performance over FRM, given the same amount of storage space. In particular, for low selectivities (less than 10/sup -4/), Dual Match significantly improves performance up to 430-fold. On the other hand, for high selectivities(more than 10/sup -2/), it shows a very minor degradation (less than 29%). For selectivities in between (10/sup -4//spl sim/10/sup -2/), Dual Match shows performance slightly better than that of FRM. Dual Match is also 4.10/spl sim/25.6 times faster than FRM in building indexes of approximately the same size. Overall, these results indicate that our approach provides a new paradigm in subsequence matching that improves performance significantly in large database applications.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"127","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 17th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2001.914837","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 127

Abstract

The authors propose a subsequence matching method, Dual Match, which exploits duality in constructing windows and significantly improves performance. Dual Match divides data sequences into disjoint windows and the query sequence into sliding windows, and thus, is a dual approach of the one by C. Faloutsos et al. (1994), which divides data sequences into sliding windows and the query sequence into disjoint windows. We formally prove that our dual approach is correct, i.e., it incurs no false dismissal. We also prove that, given the minimum query length, there is a maximum bound of the window size to guarantee correctness of Dual Match and discuss the effect of the window size on performance. FRM causes a lot of false alarms by storing minimum bounding rectangles rather than individual points representing windows to avoid excessive storage space required for the index. Dual Match solves this problem by directly storing points, but without incurring excessive storage overhead. Experimental results show that, in most cases, Dual Match provides large improvement in both false alarms and performance over FRM, given the same amount of storage space. In particular, for low selectivities (less than 10/sup -4/), Dual Match significantly improves performance up to 430-fold. On the other hand, for high selectivities(more than 10/sup -2/), it shows a very minor degradation (less than 29%). For selectivities in between (10/sup -4//spl sim/10/sup -2/), Dual Match shows performance slightly better than that of FRM. Dual Match is also 4.10/spl sim/25.6 times faster than FRM in building indexes of approximately the same size. Overall, these results indicate that our approach provides a new paradigm in subsequence matching that improves performance significantly in large database applications.

查看原文本刊更多论文

时间序列数据库中基于二象性的子序列匹配

作者提出了一种基于二元匹配的子序列匹配方法，该方法利用了构造窗口的对偶性，大大提高了性能。双匹配将数据序列划分为不相交的窗口，将查询序列划分为滑动窗口，是C. Faloutsos等人(1994)将数据序列划分为滑动窗口，将查询序列划分为不相交窗口的一种对偶方法。我们正式证明我们的二元方法是正确的，即不会产生错误的解雇。我们还证明了在给定最小查询长度的情况下，存在保证Dual Match正确性的窗口大小的最大边界，并讨论了窗口大小对性能的影响。FRM通过存储最小的边界矩形而不是代表窗口的单个点来避免索引需要过多的存储空间，从而导致大量的假警报。Dual Match通过直接存储点来解决这个问题，但不会产生过多的存储开销。实验结果表明，在大多数情况下，在相同的存储空间下，Dual Match在假警报和性能方面都比FRM有很大的提高。特别是，对于低选择性(小于10/sup -4/)， Dual Match显着提高性能高达430倍。另一方面，对于高选择性(大于10/sup -2/)，它显示出非常小的退化(小于29%)。对于介于(10/sup -4//spl /10/sup -2/)之间的选择性，Dual Match的性能略好于FRM。在构建大小大致相同的索引时，Dual Match比FRM快4.10/spl sim/25.6倍。总的来说，这些结果表明我们的方法为子序列匹配提供了一种新的范例，可以显著提高大型数据库应用程序的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings 17th International Conference on Data Engineering

自引率

0.00%

发文量