Matrix Profile XX: Finding and Visualizing Time Series Motifs of All Lengths using the Matrix Profile

2019 IEEE International Conference on Big Knowledge (ICBK) Pub Date : 2019-11-01 DOI:10.1109/ICBK.2019.00031

Frank Madrid, Shima Imani, Ryan Mercer, Zachary Schall-Zimmerman, N. S. Senobari, Eamonn J. Keogh

{"title":"Matrix Profile XX: Finding and Visualizing Time Series Motifs of All Lengths using the Matrix Profile","authors":"Frank Madrid, Shima Imani, Ryan Mercer, Zachary Schall-Zimmerman, N. S. Senobari, Eamonn J. Keogh","doi":"10.1109/ICBK.2019.00031","DOIUrl":null,"url":null,"abstract":"Many time series analytic tasks can be reduced to discovering and then reasoning about conserved structures, or time series motifs. Recently, the Matrix Profile has emerged as the state-of-the-art for finding time series motifs, allowing the community to efficiently find time series motifs in large datasets. The matrix profile reduced time series motif discovery to a process requiring a single parameter, the length of time series motifs we expect (or wish) to find. In many cases this is a reasonable limitation as the user may utilize out-of-band information or domain knowledge to set this parameter. However, in truly exploratory data mining, a poor choice of this parameter can result in failing to find unexpected and exploitable regularities in the data. In this work, we introduce the Pan Matrix Profile, a new data structure which contains the nearest neighbor information for all subsequences of all lengths. This data structure allows the first truly parameter-free motif discovery algorithm in the literature. The sheer volume of information produced by our representation may be overwhelming; thus, we also introduce a novel visualization tool called the motif-heatmap which allows the users to discover and reason about repeated structures at a glance. We demonstrate our ideas on a diverse set of domains including seismology, bioinformatics, transportation and biology.","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Big Knowledge (ICBK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBK.2019.00031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 31

Abstract

Many time series analytic tasks can be reduced to discovering and then reasoning about conserved structures, or time series motifs. Recently, the Matrix Profile has emerged as the state-of-the-art for finding time series motifs, allowing the community to efficiently find time series motifs in large datasets. The matrix profile reduced time series motif discovery to a process requiring a single parameter, the length of time series motifs we expect (or wish) to find. In many cases this is a reasonable limitation as the user may utilize out-of-band information or domain knowledge to set this parameter. However, in truly exploratory data mining, a poor choice of this parameter can result in failing to find unexpected and exploitable regularities in the data. In this work, we introduce the Pan Matrix Profile, a new data structure which contains the nearest neighbor information for all subsequences of all lengths. This data structure allows the first truly parameter-free motif discovery algorithm in the literature. The sheer volume of information produced by our representation may be overwhelming; thus, we also introduce a novel visualization tool called the motif-heatmap which allows the users to discover and reason about repeated structures at a glance. We demonstrate our ideas on a diverse set of domains including seismology, bioinformatics, transportation and biology.

查看原文本刊更多论文

矩阵配置文件XX:使用矩阵配置文件查找和可视化所有长度的时间序列图案

许多时间序列分析任务可以简化为发现和推理保守结构或时间序列基序。最近，Matrix Profile已经成为寻找时间序列基序的最先进技术，使社区能够有效地在大型数据集中找到时间序列基序。矩阵轮廓将时间序列基序发现简化为需要单个参数的过程，即我们期望(或希望)找到的时间序列基序的长度。在许多情况下，这是一个合理的限制，因为用户可以利用带外信息或领域知识来设置此参数。然而，在真正的探索性数据挖掘中，该参数的选择不当可能导致无法在数据中发现意想不到的和可利用的规律。在这项工作中，我们引入了Pan Matrix Profile，这是一种新的数据结构，它包含所有长度的所有子序列的最近邻信息。这种数据结构使得文献中第一个真正的无参数基序发现算法成为可能。我们的代表所产生的信息量可能是压倒性的;因此，我们还引入了一种新的可视化工具，称为主题热图，它允许用户一眼发现和推理重复的结构。我们在不同的领域展示了我们的想法，包括地震学、生物信息学、交通和生物学。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 IEEE International Conference on Big Knowledge (ICBK)

自引率

0.00%

发文量