Eliminating Noise in the Matrix Profile

Dieter De Paepe, Olivier Janssens, S. Hoecke
{"title":"Eliminating Noise in the Matrix Profile","authors":"Dieter De Paepe, Olivier Janssens, S. Hoecke","doi":"10.5220/0007314100830093","DOIUrl":null,"url":null,"abstract":"As companies are increasingly measuring their products and services, the amount of time series data is rising and techniques to extract usable information are needed. One recently developed data mining technique for time series is the Matrix Profile. It consists of the smallest z-normalized Euclidean distance of each subsequence of a time series to all other subsequences of another series. It has been used for motif and discord discovery, for segmentation and as building block for other techniques. One side effect of the z-normalization used is that small fluctuations on flat signals are upscaled. This can lead to high and unintuitive distances for very similar subsequences from noisy data. We determined an analytic method to estimate and remove the effects of this noise, adding only a single, intuitive parameter to the calculation of the Matrix Profile. This paper explains our method and demonstrates it by performing discord discovery on the Numenta Anomaly Benchmark and by segmenting the PAMAP2 activity dataset. We find that our technique results in a more intuitive Matrix Profile and provides improved results in both usecases for series containing many flat, noisy subsequences. Since our technique is an extension of the Matrix Profile, it can be applied to any of the various tasks that could be solved by it, improving results where data contains flat and noisy sequences.","PeriodicalId":410036,"journal":{"name":"International Conference on Pattern Recognition Applications and Methods","volume":"17 2","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Pattern Recognition Applications and Methods","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5220/0007314100830093","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9

Abstract

As companies are increasingly measuring their products and services, the amount of time series data is rising and techniques to extract usable information are needed. One recently developed data mining technique for time series is the Matrix Profile. It consists of the smallest z-normalized Euclidean distance of each subsequence of a time series to all other subsequences of another series. It has been used for motif and discord discovery, for segmentation and as building block for other techniques. One side effect of the z-normalization used is that small fluctuations on flat signals are upscaled. This can lead to high and unintuitive distances for very similar subsequences from noisy data. We determined an analytic method to estimate and remove the effects of this noise, adding only a single, intuitive parameter to the calculation of the Matrix Profile. This paper explains our method and demonstrates it by performing discord discovery on the Numenta Anomaly Benchmark and by segmenting the PAMAP2 activity dataset. We find that our technique results in a more intuitive Matrix Profile and provides improved results in both usecases for series containing many flat, noisy subsequences. Since our technique is an extension of the Matrix Profile, it can be applied to any of the various tasks that could be solved by it, improving results where data contains flat and noisy sequences.
消除矩阵轮廓中的噪声
随着公司越来越多地测量他们的产品和服务,时间序列数据的数量正在增加,需要提取可用信息的技术。最近开发的一种时间序列数据挖掘技术是矩阵剖面。它由一个时间序列的每个子序列到另一个序列的所有其他子序列的最小z归一化欧氏距离组成。它已被用于motif和discord的发现,分割和作为其他技术的构建块。使用z归一化的一个副作用是平坦信号上的小波动被放大。这可能会导致噪声数据中非常相似的子序列的高且不直观的距离。我们确定了一种分析方法来估计和消除这种噪声的影响,只添加一个单一的,直观的参数来计算矩阵轮廓。本文解释了我们的方法,并通过在Numenta异常基准上执行不和谐发现和分割PAMAP2活动数据集来演示它。我们发现我们的技术产生了一个更直观的矩阵轮廓,并且在包含许多平坦、噪声子序列的序列的两种使用情况下都提供了改进的结果。由于我们的技术是Matrix Profile的扩展,因此它可以应用于任何可以通过它解决的各种任务,从而改善数据包含平坦和噪声序列的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信