Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy

Nurjahan Begum, Liudmila Ulanova, Jun Wang, Eamonn J. Keogh
{"title":"Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy","authors":"Nurjahan Begum, Liudmila Ulanova, Jun Wang, Eamonn J. Keogh","doi":"10.1145/2783258.2783286","DOIUrl":null,"url":null,"abstract":"Clustering time series is a useful operation in its own right, and an important subroutine in many higher-level data mining analyses, including data editing for classifiers, summarization, and outlier detection. While it has been noted that the general superiority of Dynamic Time Warping (DTW) over Euclidean Distance for similarity search diminishes as we consider ever larger datasets, as we shall show, the same is not true for clustering. Thus, clustering time series under DTW remains a computationally challenging task. In this work, we address this lethargy in two ways. We propose a novel pruning strategy that exploits both upper and lower bounds to prune off a large fraction of the expensive distance calculations. This pruning strategy is admissible; giving us provably identical results to the brute force algorithm, but is at least an order of magnitude faster. For datasets where even this level of speedup is inadequate, we show that we can use a simple heuristic to order the unavoidable calculations in a most-useful-first ordering, thus casting the clustering as an anytime algorithm. We demonstrate the utility of our ideas with both single and multidimensional case studies in the domains of astronomy, speech physiology, medicine and entomology.","PeriodicalId":243428,"journal":{"name":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","volume":"6 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"107","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2783258.2783286","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 107

Abstract

Clustering time series is a useful operation in its own right, and an important subroutine in many higher-level data mining analyses, including data editing for classifiers, summarization, and outlier detection. While it has been noted that the general superiority of Dynamic Time Warping (DTW) over Euclidean Distance for similarity search diminishes as we consider ever larger datasets, as we shall show, the same is not true for clustering. Thus, clustering time series under DTW remains a computationally challenging task. In this work, we address this lethargy in two ways. We propose a novel pruning strategy that exploits both upper and lower bounds to prune off a large fraction of the expensive distance calculations. This pruning strategy is admissible; giving us provably identical results to the brute force algorithm, but is at least an order of magnitude faster. For datasets where even this level of speedup is inadequate, we show that we can use a simple heuristic to order the unavoidable calculations in a most-useful-first ordering, thus casting the clustering as an anytime algorithm. We demonstrate the utility of our ideas with both single and multidimensional case studies in the domains of astronomy, speech physiology, medicine and entomology.
一种新的可容许剪枝策略加速动态时间翘曲聚类
聚类时间序列本身就是一种有用的操作,也是许多高级数据挖掘分析(包括用于分类器、摘要和离群值检测的数据编辑)中的重要子例程。虽然已经注意到,当我们考虑更大的数据集时,动态时间翘曲(DTW)相对于欧几里得距离(Euclidean Distance)的相似性搜索的一般优势会减弱,但正如我们将展示的那样,对于聚类来说,情况并非如此。因此,DTW下的时间序列聚类仍然是一项具有计算挑战性的任务。在这项工作中,我们以两种方式解决这种嗜睡。我们提出了一种新的修剪策略,利用上界和下界来修剪掉大部分昂贵的距离计算。这种修剪策略是可以接受的;给出了与蛮力算法相同的可证明结果,但至少快了一个数量级。对于即使这种加速级别也不够的数据集,我们展示了我们可以使用简单的启发式以最有用的优先顺序对不可避免的计算进行排序,从而将聚类作为随时算法。我们通过天文学、语言生理学、医学和昆虫学领域的单一和多维案例研究展示了我们的想法的实用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信