Fast and Flexible Multivariate Time Series Subsequence Search

Kanishka Bhaduri, Qiang Zhu, N. Oza, A. Srivastava
{"title":"Fast and Flexible Multivariate Time Series Subsequence Search","authors":"Kanishka Bhaduri, Qiang Zhu, N. Oza, A. Srivastava","doi":"10.1109/ICDM.2010.36","DOIUrl":null,"url":null,"abstract":"Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical monitoring, and financial systems. Domain experts are often interested in searching for interesting multivariate patterns from these MTS databases which can contain up to several gigabytes of data. Surprisingly, research on MTS search is very limited. Most existing work only supports queries with the same length of data, or queries on a fixed set of variables. In this paper, we propose an efficient and flexible subsequence search framework for massive MTS databases, that, for the first time, enables querying on any subset of variables with arbitrary time delays between them. We propose two provably correct algorithms to solve this problem #x2014; (1) an R*-tree Based Search (RBS) which uses Minimum Bounding Rectangles (MBR) to organize the subsequences, and (2) a List Based Search (LBS) algorithm which uses sorted lists for indexing. We demonstrate the performance of these algorithms using two large MTS databases from the aviation domain, each containing several millions of observations. Both these tests show that our algorithms have very high prune rates (>95%) thus needing actual disk access for only less than 5% of the observations. To the best of our knowledge, this is the first flexible MTS search algorithm capable of subsequence search on any subset of variables. Moreover, MTS subsequence search has never been attempted on datasets of the size we have used in this paper.","PeriodicalId":294061,"journal":{"name":"2010 IEEE International Conference on Data Mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE International Conference on Data Mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2010.36","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 13

Abstract

Multivariate Time-Series (MTS) are ubiquitous, and are generated in areas as disparate as sensor recordings in aerospace systems, music and video streams, medical monitoring, and financial systems. Domain experts are often interested in searching for interesting multivariate patterns from these MTS databases which can contain up to several gigabytes of data. Surprisingly, research on MTS search is very limited. Most existing work only supports queries with the same length of data, or queries on a fixed set of variables. In this paper, we propose an efficient and flexible subsequence search framework for massive MTS databases, that, for the first time, enables querying on any subset of variables with arbitrary time delays between them. We propose two provably correct algorithms to solve this problem #x2014; (1) an R*-tree Based Search (RBS) which uses Minimum Bounding Rectangles (MBR) to organize the subsequences, and (2) a List Based Search (LBS) algorithm which uses sorted lists for indexing. We demonstrate the performance of these algorithms using two large MTS databases from the aviation domain, each containing several millions of observations. Both these tests show that our algorithms have very high prune rates (>95%) thus needing actual disk access for only less than 5% of the observations. To the best of our knowledge, this is the first flexible MTS search algorithm capable of subsequence search on any subset of variables. Moreover, MTS subsequence search has never been attempted on datasets of the size we have used in this paper.
快速灵活的多元时间序列子序列搜索
多变量时间序列(MTS)无处不在,并且在不同的领域产生,如航空航天系统中的传感器记录、音乐和视频流、医疗监控和金融系统。领域专家通常对从这些MTS数据库中搜索有趣的多变量模式感兴趣,这些数据库可以包含多达几gb的数据。令人惊讶的是,对MTS搜索的研究非常有限。大多数现有工作只支持具有相同数据长度的查询,或者对一组固定变量的查询。在本文中,我们提出了一种高效灵活的大规模MTS数据库子序列搜索框架,首次实现了对具有任意时间延迟的任意变量子集的查询。我们提出了两个可证明正确的算法来解决这个问题#x2014;(1)使用最小边界矩形(MBR)来组织子序列的基于R*树的搜索(RBS),以及(2)使用排序列表进行索引的基于列表的搜索(LBS)算法。我们使用来自航空领域的两个大型MTS数据库来演示这些算法的性能,每个数据库包含数百万个观测值。这两个测试都表明,我们的算法具有非常高的剪枝率(大约95%),因此只需要不到5%的观测值进行实际的磁盘访问。据我们所知,这是第一个能够在任意变量子集上进行子序列搜索的灵活MTS搜索算法。此外,MTS子序列搜索从未在我们在本文中使用的数据集上尝试过。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信