矩阵配置文件XX:使用矩阵配置文件查找和可视化所有长度的时间序列图案

Frank Madrid, Shima Imani, Ryan Mercer, Zachary Schall-Zimmerman, N. S. Senobari, Eamonn J. Keogh
{"title":"矩阵配置文件XX:使用矩阵配置文件查找和可视化所有长度的时间序列图案","authors":"Frank Madrid, Shima Imani, Ryan Mercer, Zachary Schall-Zimmerman, N. S. Senobari, Eamonn J. Keogh","doi":"10.1109/ICBK.2019.00031","DOIUrl":null,"url":null,"abstract":"Many time series analytic tasks can be reduced to discovering and then reasoning about conserved structures, or time series motifs. Recently, the Matrix Profile has emerged as the state-of-the-art for finding time series motifs, allowing the community to efficiently find time series motifs in large datasets. The matrix profile reduced time series motif discovery to a process requiring a single parameter, the length of time series motifs we expect (or wish) to find. In many cases this is a reasonable limitation as the user may utilize out-of-band information or domain knowledge to set this parameter. However, in truly exploratory data mining, a poor choice of this parameter can result in failing to find unexpected and exploitable regularities in the data. In this work, we introduce the Pan Matrix Profile, a new data structure which contains the nearest neighbor information for all subsequences of all lengths. This data structure allows the first truly parameter-free motif discovery algorithm in the literature. The sheer volume of information produced by our representation may be overwhelming; thus, we also introduce a novel visualization tool called the motif-heatmap which allows the users to discover and reason about repeated structures at a glance. We demonstrate our ideas on a diverse set of domains including seismology, bioinformatics, transportation and biology.","PeriodicalId":383917,"journal":{"name":"2019 IEEE International Conference on Big Knowledge (ICBK)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"31","resultStr":"{\"title\":\"Matrix Profile XX: Finding and Visualizing Time Series Motifs of All Lengths using the Matrix Profile\",\"authors\":\"Frank Madrid, Shima Imani, Ryan Mercer, Zachary Schall-Zimmerman, N. S. Senobari, Eamonn J. Keogh\",\"doi\":\"10.1109/ICBK.2019.00031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many time series analytic tasks can be reduced to discovering and then reasoning about conserved structures, or time series motifs. Recently, the Matrix Profile has emerged as the state-of-the-art for finding time series motifs, allowing the community to efficiently find time series motifs in large datasets. The matrix profile reduced time series motif discovery to a process requiring a single parameter, the length of time series motifs we expect (or wish) to find. In many cases this is a reasonable limitation as the user may utilize out-of-band information or domain knowledge to set this parameter. However, in truly exploratory data mining, a poor choice of this parameter can result in failing to find unexpected and exploitable regularities in the data. In this work, we introduce the Pan Matrix Profile, a new data structure which contains the nearest neighbor information for all subsequences of all lengths. This data structure allows the first truly parameter-free motif discovery algorithm in the literature. The sheer volume of information produced by our representation may be overwhelming; thus, we also introduce a novel visualization tool called the motif-heatmap which allows the users to discover and reason about repeated structures at a glance. We demonstrate our ideas on a diverse set of domains including seismology, bioinformatics, transportation and biology.\",\"PeriodicalId\":383917,\"journal\":{\"name\":\"2019 IEEE International Conference on Big Knowledge (ICBK)\",\"volume\":\"73 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"31\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE International Conference on Big Knowledge (ICBK)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICBK.2019.00031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE International Conference on Big Knowledge (ICBK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICBK.2019.00031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 31

摘要

许多时间序列分析任务可以简化为发现和推理保守结构或时间序列基序。最近,Matrix Profile已经成为寻找时间序列基序的最先进技术,使社区能够有效地在大型数据集中找到时间序列基序。矩阵轮廓将时间序列基序发现简化为需要单个参数的过程,即我们期望(或希望)找到的时间序列基序的长度。在许多情况下,这是一个合理的限制,因为用户可以利用带外信息或领域知识来设置此参数。然而,在真正的探索性数据挖掘中,该参数的选择不当可能导致无法在数据中发现意想不到的和可利用的规律。在这项工作中,我们引入了Pan Matrix Profile,这是一种新的数据结构,它包含所有长度的所有子序列的最近邻信息。这种数据结构使得文献中第一个真正的无参数基序发现算法成为可能。我们的代表所产生的信息量可能是压倒性的;因此,我们还引入了一种新的可视化工具,称为主题热图,它允许用户一眼发现和推理重复的结构。我们在不同的领域展示了我们的想法,包括地震学、生物信息学、交通和生物学。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Matrix Profile XX: Finding and Visualizing Time Series Motifs of All Lengths using the Matrix Profile
Many time series analytic tasks can be reduced to discovering and then reasoning about conserved structures, or time series motifs. Recently, the Matrix Profile has emerged as the state-of-the-art for finding time series motifs, allowing the community to efficiently find time series motifs in large datasets. The matrix profile reduced time series motif discovery to a process requiring a single parameter, the length of time series motifs we expect (or wish) to find. In many cases this is a reasonable limitation as the user may utilize out-of-band information or domain knowledge to set this parameter. However, in truly exploratory data mining, a poor choice of this parameter can result in failing to find unexpected and exploitable regularities in the data. In this work, we introduce the Pan Matrix Profile, a new data structure which contains the nearest neighbor information for all subsequences of all lengths. This data structure allows the first truly parameter-free motif discovery algorithm in the literature. The sheer volume of information produced by our representation may be overwhelming; thus, we also introduce a novel visualization tool called the motif-heatmap which allows the users to discover and reason about repeated structures at a glance. We demonstrate our ideas on a diverse set of domains including seismology, bioinformatics, transportation and biology.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信