Online windowed subsequence matching over probabilistic sequences

Zheng Li, Tingjian Ge
{"title":"Online windowed subsequence matching over probabilistic sequences","authors":"Zheng Li, Tingjian Ge","doi":"10.1145/2213836.2213868","DOIUrl":null,"url":null,"abstract":"Windowed subsequence matching over deterministic strings has been studied in previous work in the contexts of knowledge discovery, data mining, and molecular biology. However, we observe that in these applications, as well as in data stream monitoring, complex event processing, and time series data processing in which streams can be mapped to strings, the strings are often noisy and probabilistic. We study this problem in the online setting where efficiency is paramount. We first formulate the query semantics, and propose an exact algorithm. Then we propose a randomized approximation algorithm that is faster and, in the mean time, provably accurate. Moreover, we devise a filtering algorithm to further enhance the efficiency with an optimization technique that is adaptive to sequence stream contents. Finally, we propose algorithms for patterns with negations. In order to verify the algorithms, we conduct a systematic empirical study using three real datasets and some synthetic datasets.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"69 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2213836.2213868","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

Windowed subsequence matching over deterministic strings has been studied in previous work in the contexts of knowledge discovery, data mining, and molecular biology. However, we observe that in these applications, as well as in data stream monitoring, complex event processing, and time series data processing in which streams can be mapped to strings, the strings are often noisy and probabilistic. We study this problem in the online setting where efficiency is paramount. We first formulate the query semantics, and propose an exact algorithm. Then we propose a randomized approximation algorithm that is faster and, in the mean time, provably accurate. Moreover, we devise a filtering algorithm to further enhance the efficiency with an optimization technique that is adaptive to sequence stream contents. Finally, we propose algorithms for patterns with negations. In order to verify the algorithms, we conduct a systematic empirical study using three real datasets and some synthetic datasets.
基于概率序列的在线窗子序列匹配
确定性字符串上的窗口子序列匹配已经在知识发现、数据挖掘和分子生物学的背景下进行了研究。然而,我们观察到,在这些应用程序中,以及在数据流监控、复杂事件处理和时间序列数据处理中,流可以映射到字符串,字符串通常是有噪声的和概率的。我们在效率至上的在线环境中研究这个问题。我们首先表述了查询语义,并提出了一个精确的算法。然后,我们提出了一种更快的随机逼近算法,同时可以证明它是准确的。此外,我们设计了一种过滤算法,利用自适应序列流内容的优化技术进一步提高了效率。最后,我们提出了带有否定的模式的算法。为了验证算法,我们使用三个真实数据集和一些合成数据集进行了系统的实证研究。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信