Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Pub Date : 2015-08-10 DOI:10.1145/2783258.2783286

Nurjahan Begum, Liudmila Ulanova, Jun Wang, Eamonn J. Keogh

{"title":"Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy","authors":"Nurjahan Begum, Liudmila Ulanova, Jun Wang, Eamonn J. Keogh","doi":"10.1145/2783258.2783286","DOIUrl":null,"url":null,"abstract":"Clustering time series is a useful operation in its own right, and an important subroutine in many higher-level data mining analyses, including data editing for classifiers, summarization, and outlier detection. While it has been noted that the general superiority of Dynamic Time Warping (DTW) over Euclidean Distance for similarity search diminishes as we consider ever larger datasets, as we shall show, the same is not true for clustering. Thus, clustering time series under DTW remains a computationally challenging task. In this work, we address this lethargy in two ways. We propose a novel pruning strategy that exploits both upper and lower bounds to prune off a large fraction of the expensive distance calculations. This pruning strategy is admissible; giving us provably identical results to the brute force algorithm, but is at least an order of magnitude faster. For datasets where even this level of speedup is inadequate, we show that we can use a simple heuristic to order the unavoidable calculations in a most-useful-first ordering, thus casting the clustering as an anytime algorithm. We demonstrate the utility of our ideas with both single and multidimensional case studies in the domains of astronomy, speech physiology, medicine and entomology.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"107","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2783258.2783286","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 107

Abstract

Clustering time series is a useful operation in its own right, and an important subroutine in many higher-level data mining analyses, including data editing for classifiers, summarization, and outlier detection. While it has been noted that the general superiority of Dynamic Time Warping (DTW) over Euclidean Distance for similarity search diminishes as we consider ever larger datasets, as we shall show, the same is not true for clustering. Thus, clustering time series under DTW remains a computationally challenging task. In this work, we address this lethargy in two ways. We propose a novel pruning strategy that exploits both upper and lower bounds to prune off a large fraction of the expensive distance calculations. This pruning strategy is admissible; giving us provably identical results to the brute force algorithm, but is at least an order of magnitude faster. For datasets where even this level of speedup is inadequate, we show that we can use a simple heuristic to order the unavoidable calculations in a most-useful-first ordering, thus casting the clustering as an anytime algorithm. We demonstrate the utility of our ideas with both single and multidimensional case studies in the domains of astronomy, speech physiology, medicine and entomology.

查看原文本刊更多论文

一种新的可容许剪枝策略加速动态时间翘曲聚类

聚类时间序列本身就是一种有用的操作，也是许多高级数据挖掘分析(包括用于分类器、摘要和离群值检测的数据编辑)中的重要子例程。虽然已经注意到，当我们考虑更大的数据集时，动态时间翘曲(DTW)相对于欧几里得距离(Euclidean Distance)的相似性搜索的一般优势会减弱，但正如我们将展示的那样，对于聚类来说，情况并非如此。因此，DTW下的时间序列聚类仍然是一项具有计算挑战性的任务。在这项工作中，我们以两种方式解决这种嗜睡。我们提出了一种新的修剪策略，利用上界和下界来修剪掉大部分昂贵的距离计算。这种修剪策略是可以接受的;给出了与蛮力算法相同的可证明结果，但至少快了一个数量级。对于即使这种加速级别也不够的数据集，我们展示了我们可以使用简单的启发式以最有用的优先顺序对不可避免的计算进行排序，从而将聚类作为随时算法。我们通过天文学、语言生理学、医学和昆虫学领域的单一和多维案例研究展示了我们的想法的实用性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

自引率

0.00%

发文量