Discovering Event Queries from Traces: Laying Foundations for Subsequence-Queries with Wildcards and Gap-Size Constraints

Sarah Kleest-Meißner, Rebecca Sattler, Markus L. Schmid, Nicole Schweikardt, M. Weidlich
{"title":"Discovering Event Queries from Traces: Laying Foundations for Subsequence-Queries with Wildcards and Gap-Size Constraints","authors":"Sarah Kleest-Meißner, Rebecca Sattler, Markus L. Schmid, Nicole Schweikardt, M. Weidlich","doi":"10.4230/LIPIcs.ICDT.2022.18","DOIUrl":null,"url":null,"abstract":"We introduce subsequence-queries with wildcards and gap-size constraints (swg-queries, for short) as a tool for querying event traces. An swg-query q is given by a string s over an alphabet of variables and types, a global window size w , and a tuple c = (( c − 1 , c +1 ) , ( c − 2 , c +2 ) , . . . , ( c −| s |− 1 , c + | s |− 1 )) of local gap-size constraints over N × ( N ∪ {∞} ). The query q matches in a trace t (i. e., a sequence of types) if the variables can uniformly be substituted by types such that the resulting string occurs in t as a subsequence that spans an area of length at most w , and the i th gap of the subsequence (i. e., the distance between the i th and ( i +1) th position of the subsequence) has length at least c − i and at most c + i . We formalise and investigate the task of discovering an swg-query that describes best the traces from a given sample S of traces, and we present an algorithm solving this task. As a central component, our algorithm repeatedly solves the matching problem (i. e., deciding whether a given query q matches in a given trace t ), which is an NP-complete problem (in combined complexity). Hence, the matching problem is of special interest in the context of query discovery, and we therefore subject it to a detailed (parameterised) complexity analysis to identify tractable subclasses, which lead to tractable subclasses of the discovery problem as well. We complement this by a reduction proving Proof sketch. A natural brute-force approach is as follows: Upon input of an swg-query q = ( s, w, c ) and a trace t , we enumerate all mappings π : repvars ( q ) → types ( t ), and for each such mapping, we construct a regular expression R π that describes all traces t ′ for which there exists a substitution µ : vars ( q ) ∪ Γ → Γ such that µ is an extension of π and µ ( s ) ≼ e t ′ for some embedding e that satisfies w and c . Then, we only have to check for each of these mappings π , if the regular expression R π matches in t . Another approach is to enumerate all embeddings e : [ | s | ] → [ | t | ] that satisfy w and c and check for each such embedding e whether µ ( s ) ≼ e t for some substitution µ (which can be done in time O( | s | ), since µ must satisfy µ ( s ) = t [ e (1)] t [ e (2)] . . . t [ e ( | s | )]). From these two algorithms and the obvious dependencies between the parameters, we can directly conclude the statements of the theorem.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.ICDT.2022.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 10

Abstract

We introduce subsequence-queries with wildcards and gap-size constraints (swg-queries, for short) as a tool for querying event traces. An swg-query q is given by a string s over an alphabet of variables and types, a global window size w , and a tuple c = (( c − 1 , c +1 ) , ( c − 2 , c +2 ) , . . . , ( c −| s |− 1 , c + | s |− 1 )) of local gap-size constraints over N × ( N ∪ {∞} ). The query q matches in a trace t (i. e., a sequence of types) if the variables can uniformly be substituted by types such that the resulting string occurs in t as a subsequence that spans an area of length at most w , and the i th gap of the subsequence (i. e., the distance between the i th and ( i +1) th position of the subsequence) has length at least c − i and at most c + i . We formalise and investigate the task of discovering an swg-query that describes best the traces from a given sample S of traces, and we present an algorithm solving this task. As a central component, our algorithm repeatedly solves the matching problem (i. e., deciding whether a given query q matches in a given trace t ), which is an NP-complete problem (in combined complexity). Hence, the matching problem is of special interest in the context of query discovery, and we therefore subject it to a detailed (parameterised) complexity analysis to identify tractable subclasses, which lead to tractable subclasses of the discovery problem as well. We complement this by a reduction proving Proof sketch. A natural brute-force approach is as follows: Upon input of an swg-query q = ( s, w, c ) and a trace t , we enumerate all mappings π : repvars ( q ) → types ( t ), and for each such mapping, we construct a regular expression R π that describes all traces t ′ for which there exists a substitution µ : vars ( q ) ∪ Γ → Γ such that µ is an extension of π and µ ( s ) ≼ e t ′ for some embedding e that satisfies w and c . Then, we only have to check for each of these mappings π , if the regular expression R π matches in t . Another approach is to enumerate all embeddings e : [ | s | ] → [ | t | ] that satisfy w and c and check for each such embedding e whether µ ( s ) ≼ e t for some substitution µ (which can be done in time O( | s | ), since µ must satisfy µ ( s ) = t [ e (1)] t [ e (2)] . . . t [ e ( | s | )]). From these two algorithms and the obvious dependencies between the parameters, we can directly conclude the statements of the theorem.
从跟踪中发现事件查询:为带有通配符和间隙大小约束的子序列查询奠定基础
我们引入带有通配符和间隙大小约束的子查询(简称swg查询)作为查询事件跟踪的工具。swg查询q由变量和类型字母表上的字符串s、全局窗口大小w和元组c = ((c−1,c +1), (c−2,c +2),…, (c−| s |−1,c + | s |−1))在N × (N∪{∞})上的局部间隙大小约束。查询问匹配跟踪t(即一系列类型)如果一致可以替换的变量类型,这样生成的字符串出现在t作为子序列跨度的长度最多w,我th差距的子序列(即我th和之间的距离(i + 1) th子序列的位置)长度至少c−我最多和c +。我们形式化并研究了从给定的轨迹样本S中发现最能描述轨迹的swg查询的任务,并提出了解决该任务的算法。作为中心组件,我们的算法反复解决匹配问题(即决定给定查询q在给定跟踪t中是否匹配),这是一个np完全问题(组合复杂度)。因此,匹配问题在查询发现的上下文中特别重要,因此我们对其进行了详细的(参数化的)复杂性分析,以识别可处理的子类,这也会导致发现问题的可处理子类。我们补充了一个简化证明的证明草图。自然蛮力方法如下:在输入一个swg-query q = (s, w c)和跟踪t,我们列举所有映射π:repvars (q)→类型(t),对于每一次这样的映射,我们构造一个正则表达式Rπ,描述了所有的痕迹t’的存在一个替换µ:var (q)∪Γ→Γ这样µ是π的扩展和µ(s)≼e t '对于一些嵌入满足w e和c。然后,我们只需要检查每一个映射,如果正则表达式R在t中匹配。另一种方法是枚举满足w和c的所有嵌入e: [| s |]→[| t |],并检查每个这样的嵌入e是否对某些替换μ(这可以在时间O(| s |)中完成),因为µ必须满足µ(s) = t [e (1)] t [e(2)]。T [e (| s |)])。从这两种算法和参数之间明显的依赖关系,我们可以直接得出定理的表述。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信