Discovering Event Queries from Traces: Laying Foundations for Subsequence-Queries with Wildcards and Gap-Size Constraints

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory Pub Date : 2022-01-01 DOI:10.4230/LIPIcs.ICDT.2022.18

Sarah Kleest-Meißner, Rebecca Sattler, Markus L. Schmid, Nicole Schweikardt, M. Weidlich

{"title":"Discovering Event Queries from Traces: Laying Foundations for Subsequence-Queries with Wildcards and Gap-Size Constraints","authors":"Sarah Kleest-Meißner, Rebecca Sattler, Markus L. Schmid, Nicole Schweikardt, M. Weidlich","doi":"10.4230/LIPIcs.ICDT.2022.18","DOIUrl":null,"url":null,"abstract":"We introduce subsequence-queries with wildcards and gap-size constraints (swg-queries, for short) as a tool for querying event traces. An swg-query q is given by a string s over an alphabet of variables and types, a global window size w , and a tuple c = (( c − 1 , c +1 ) , ( c − 2 , c +2 ) , . . . , ( c −| s |− 1 , c + | s |− 1 )) of local gap-size constraints over N × ( N ∪ {∞} ). The query q matches in a trace t (i. e., a sequence of types) if the variables can uniformly be substituted by types such that the resulting string occurs in t as a subsequence that spans an area of length at most w , and the i th gap of the subsequence (i. e., the distance between the i th and ( i +1) th position of the subsequence) has length at least c − i and at most c + i . We formalise and investigate the task of discovering an swg-query that describes best the traces from a given sample S of traces, and we present an algorithm solving this task. As a central component, our algorithm repeatedly solves the matching problem (i. e., deciding whether a given query q matches in a given trace t ), which is an NP-complete problem (in combined complexity). Hence, the matching problem is of special interest in the context of query discovery, and we therefore subject it to a detailed (parameterised) complexity analysis to identify tractable subclasses, which lead to tractable subclasses of the discovery problem as well. We complement this by a reduction proving Proof sketch. A natural brute-force approach is as follows: Upon input of an swg-query q = ( s, w, c ) and a trace t , we enumerate all mappings π : repvars ( q ) → types ( t ), and for each such mapping, we construct a regular expression R π that describes all traces t ′ for which there exists a substitution µ : vars ( q ) ∪ Γ → Γ such that µ is an extension of π and µ ( s ) ≼ e t ′ for some embedding e that satisfies w and c . Then, we only have to check for each of these mappings π , if the regular expression R π matches in t . Another approach is to enumerate all embeddings e : [ | s | ] → [ | t | ] that satisfy w and c and check for each such embedding e whether µ ( s ) ≼ e t for some substitution µ (which can be done in time O( | s | ), since µ must satisfy µ ( s ) = t [ e (1)] t [ e (2)] . . . t [ e ( | s | )]). From these two algorithms and the obvious dependencies between the parameters, we can directly conclude the statements of the theorem.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"27 1","pages":"18:1-18:21"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/LIPIcs.ICDT.2022.18","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

We introduce subsequence-queries with wildcards and gap-size constraints (swg-queries, for short) as a tool for querying event traces. An swg-query q is given by a string s over an alphabet of variables and types, a global window size w , and a tuple c = (( c − 1 , c +1 ) , ( c − 2 , c +2 ) , . . . , ( c −| s |− 1 , c + | s |− 1 )) of local gap-size constraints over N × ( N ∪ {∞} ). The query q matches in a trace t (i. e., a sequence of types) if the variables can uniformly be substituted by types such that the resulting string occurs in t as a subsequence that spans an area of length at most w , and the i th gap of the subsequence (i. e., the distance between the i th and ( i +1) th position of the subsequence) has length at least c − i and at most c + i . We formalise and investigate the task of discovering an swg-query that describes best the traces from a given sample S of traces, and we present an algorithm solving this task. As a central component, our algorithm repeatedly solves the matching problem (i. e., deciding whether a given query q matches in a given trace t ), which is an NP-complete problem (in combined complexity). Hence, the matching problem is of special interest in the context of query discovery, and we therefore subject it to a detailed (parameterised) complexity analysis to identify tractable subclasses, which lead to tractable subclasses of the discovery problem as well. We complement this by a reduction proving Proof sketch. A natural brute-force approach is as follows: Upon input of an swg-query q = ( s, w, c ) and a trace t , we enumerate all mappings π : repvars ( q ) → types ( t ), and for each such mapping, we construct a regular expression R π that describes all traces t ′ for which there exists a substitution µ : vars ( q ) ∪ Γ → Γ such that µ is an extension of π and µ ( s ) ≼ e t ′ for some embedding e that satisfies w and c . Then, we only have to check for each of these mappings π , if the regular expression R π matches in t . Another approach is to enumerate all embeddings e : [ | s | ] → [ | t | ] that satisfy w and c and check for each such embedding e whether µ ( s ) ≼ e t for some substitution µ (which can be done in time O( | s | ), since µ must satisfy µ ( s ) = t [ e (1)] t [ e (2)] . . . t [ e ( | s | )]). From these two algorithms and the obvious dependencies between the parameters, we can directly conclude the statements of the theorem.

查看原文本刊更多论文

从跟踪中发现事件查询:为带有通配符和间隙大小约束的子序列查询奠定基础

我们引入带有通配符和间隙大小约束的子查询(简称swg查询)作为查询事件跟踪的工具。swg查询q由变量和类型字母表上的字符串s、全局窗口大小w和元组c = ((c−1,c +1)， (c−2,c +2)，…， (c−| s |−1,c + | s |−1))在N × (N∪{∞})上的局部间隙大小约束。查询问匹配跟踪t(即一系列类型)如果一致可以替换的变量类型,这样生成的字符串出现在t作为子序列跨度的长度最多w,我th差距的子序列(即我th和之间的距离(i + 1) th子序列的位置)长度至少c−我最多和c +。我们形式化并研究了从给定的轨迹样本S中发现最能描述轨迹的swg查询的任务，并提出了解决该任务的算法。作为中心组件，我们的算法反复解决匹配问题(即决定给定查询q在给定跟踪t中是否匹配)，这是一个np完全问题(组合复杂度)。因此，匹配问题在查询发现的上下文中特别重要，因此我们对其进行了详细的(参数化的)复杂性分析，以识别可处理的子类，这也会导致发现问题的可处理子类。我们补充了一个简化证明的证明草图。自然蛮力方法如下:在输入一个swg-query q = (s, w c)和跟踪t,我们列举所有映射π:repvars (q)→类型(t),对于每一次这样的映射,我们构造一个正则表达式Rπ,描述了所有的痕迹t’的存在一个替换µ:var (q)∪Γ→Γ这样µ是π的扩展和µ(s)≼e t '对于一些嵌入满足w e和c。然后，我们只需要检查每一个映射，如果正则表达式R在t中匹配。另一种方法是枚举满足w和c的所有嵌入e: [| s |]→[| t |]，并检查每个这样的嵌入e是否对某些替换μ(这可以在时间O(| s |)中完成)，因为µ必须满足µ(s) = t [e (1)] t [e(2)]。T [e (| s |)])。从这两种算法和参数之间明显的依赖关系，我们可以直接得出定理的表述。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory

自引率

0.00%

发文量