Audrey Der, Chin-Chia Michael Yeh, R. Wu, Junpeng Wang, Yan Zheng, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn J. Keogh
{"title":"Matrix Profile XXVII: A Novel Distance Measure for Comparing Long Time Series","authors":"Audrey Der, Chin-Chia Michael Yeh, R. Wu, Junpeng Wang, Yan Zheng, Zhongfang Zhuang, Liang Wang, Wei Zhang, Eamonn J. Keogh","doi":"10.1109/ICKG55886.2022.00013","DOIUrl":null,"url":null,"abstract":"The most useful data mining primitives are distance measures. With an effective distance measure, it is possible to perform classification, clustering, anomaly detection, segmentation, etc. For single-event time series Euclidean Distance and Dynamic Time Warping distance are known to be extremely effective. However, for time series containing cyclical behaviors, the semantic meaningfulness of such comparisons is less clear. For example, on two separate days the telemetry from an athlete's workout routine might be very similar. However, on the second day she might have changed the order in which she did push-ups and squats, added a few repetitions of pull-ups, or completely omitted dumbbell curls. Any one of these minor changes would defeat existing time series distance measures. Some “bag-of-features” methods have been proposed to address this problem; however, we argue that in many cases, similarity is intimately tied to the shapes of subsequences within these longer time series. In such cases, summative features will lack discrimination ability. In this work we introduce PRCIS, which stands for Pattern Representation Comparison in Series. PRCIS is a distance measure for long time series, which exploits recent progress in our ability to summarize time series with “dictionaries”. We will demonstrate the utility of our ideas on diverse tasks and datasets.","PeriodicalId":278067,"journal":{"name":"2022 IEEE International Conference on Knowledge Graph (ICKG)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE International Conference on Knowledge Graph (ICKG)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICKG55886.2022.00013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
The most useful data mining primitives are distance measures. With an effective distance measure, it is possible to perform classification, clustering, anomaly detection, segmentation, etc. For single-event time series Euclidean Distance and Dynamic Time Warping distance are known to be extremely effective. However, for time series containing cyclical behaviors, the semantic meaningfulness of such comparisons is less clear. For example, on two separate days the telemetry from an athlete's workout routine might be very similar. However, on the second day she might have changed the order in which she did push-ups and squats, added a few repetitions of pull-ups, or completely omitted dumbbell curls. Any one of these minor changes would defeat existing time series distance measures. Some “bag-of-features” methods have been proposed to address this problem; however, we argue that in many cases, similarity is intimately tied to the shapes of subsequences within these longer time series. In such cases, summative features will lack discrimination ability. In this work we introduce PRCIS, which stands for Pattern Representation Comparison in Series. PRCIS is a distance measure for long time series, which exploits recent progress in our ability to summarize time series with “dictionaries”. We will demonstrate the utility of our ideas on diverse tasks and datasets.