{"title":"An index-based approach for similarity search supporting time warping in large sequence databases","authors":"Sang-Wook Kim, Sanghyun Park, W. Chu","doi":"10.1109/ICDE.2001.914875","DOIUrl":null,"url":null,"abstract":"This paper proposes a new novel method for similarity search that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. Previous methods for processing similarity search that supports time warping fail to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. Our primary goal is to innovate on search performance without permitting any false dismissal. To attain this goal, we devise a new distance function D/sub tw-lb/ that consistently underestimates the time warping distance and also satisfies the triangular inequality D/sub tw-lb/ uses a 4-tuple feature vector that is extracted from each sequence and is invariant to time warping. For efficient processing of similarity search, we employ a multi-dimensional index that uses the 4-tuple feature vector as indexing attributes and D/sub tw-lb/ as a distance function. The extensive experimental results reveal that our method achieves significant speedup up to 43 times with real-world S&P 500 stock data and up to 720 times with very large synthetic data.","PeriodicalId":431818,"journal":{"name":"Proceedings 17th International Conference on Data Engineering","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2001-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"324","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 17th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2001.914875","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 324
Abstract
This paper proposes a new novel method for similarity search that supports time warping in large sequence databases. Time warping enables finding sequences with similar patterns even when they are of different lengths. Previous methods for processing similarity search that supports time warping fail to employ multi-dimensional indexes without false dismissal since the time warping distance does not satisfy the triangular inequality. Our primary goal is to innovate on search performance without permitting any false dismissal. To attain this goal, we devise a new distance function D/sub tw-lb/ that consistently underestimates the time warping distance and also satisfies the triangular inequality D/sub tw-lb/ uses a 4-tuple feature vector that is extracted from each sequence and is invariant to time warping. For efficient processing of similarity search, we employ a multi-dimensional index that uses the 4-tuple feature vector as indexing attributes and D/sub tw-lb/ as a distance function. The extensive experimental results reveal that our method achieves significant speedup up to 43 times with real-world S&P 500 stock data and up to 720 times with very large synthetic data.
提出了一种支持大型序列数据库时间规整的相似性搜索新方法。时间扭曲可以发现具有相似模式的序列,即使它们的长度不同。由于时间扭曲距离不满足三角不等式,以往支持时间扭曲的相似度搜索处理方法不能采用多维索引而不产生误解雇。我们的主要目标是在不允许任何虚假解雇的情况下对搜索性能进行创新。为了实现这一目标,我们设计了一个新的距离函数D/sub two -lb/,它始终低估了时间翘曲距离,并且还满足三角形不等式D/sub two -lb/,该函数使用从每个序列中提取的4元组特征向量,并且对时间翘曲不变。为了高效地处理相似性搜索,我们采用了一种多维索引,该索引使用4元组特征向量作为索引属性,D/sub 2 -lb/作为距离函数。广泛的实验结果表明,我们的方法在实际标准普尔500指数股票数据上实现了显著的加速,最高可达43倍,在非常大的合成数据上可达720倍。