arXiv - CS - Databases最新文献_第5页

Process Trace Querying using Knowledge Graphs and Notation3 使用知识图谱和符号查询流程跟踪3

arXiv - CS - Databases Pub Date : 2024-08-26 DOI: arxiv-2409.04452

William Van Woensel

{"title":"Process Trace Querying using Knowledge Graphs and Notation3","authors":"William Van Woensel","doi":"arxiv-2409.04452","DOIUrl":"https://doi.org/arxiv-2409.04452","url":null,"abstract":"In process mining, a log exploration step allows making sense of the event\u0000traces; e.g., identifying event patterns and illogical traces, and gaining\u0000insight into their variability. To support expressive log exploration, the\u0000event log can be converted into a Knowledge Graph (KG), which can then be\u0000queried using general-purpose languages. We explore the creation of semantic KG\u0000using the Resource Description Framework (RDF) as a data model, combined with\u0000the general-purpose Notation3 (N3) rule language for querying. We show how\u0000typical trace querying constraints, inspired by the state of the art, can be\u0000implemented in N3. We convert case- and object-centric event logs into a\u0000trace-based semantic KG; OCEL2 logs are hereby \"flattened\" into traces based on\u0000object paths through the KG. This solution offers (a) expressivity, as queries\u0000can instantiate constraints in multiple ways and arbitrarily constrain\u0000attributes and relations (e.g., actors, resources); (b) flexibility, as OCEL2\u0000event logs can be serialized as traces in arbitrary ways based on the KG; and\u0000(c) extensibility, as others can extend our library by leveraging the same\u0000implementation patterns.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"19 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223544","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Multi-variable Quantification of BDDs in External Memory using Nested Sweeping (Extended Paper) 利用嵌套扫描对外部存储器中的 BDD 进行多变量量化（扩展论文）

arXiv - CS - Databases Pub Date : 2024-08-26 DOI: arxiv-2408.14216

Steffan Christ Sølvsten, Jaco van de Pol

引用次数: 0

$boldsymbol{Steiner}$-Hardness: A Query Hardness Measure for Graph-Based ANN Indexes $boldsymbol{Steiner}$-Hardness：基于图的 ANN 索引的查询硬度度量

arXiv - CS - Databases Pub Date : 2024-08-25 DOI: arxiv-2408.13899

Zeyu Wang, Qitong Wang, Xiaoxing Cheng, Peng Wang, Themis Palpanas, Wei Wang

{"title":"$boldsymbol{Steiner}$-Hardness: A Query Hardness Measure for Graph-Based ANN Indexes","authors":"Zeyu Wang, Qitong Wang, Xiaoxing Cheng, Peng Wang, Themis Palpanas, Wei Wang","doi":"arxiv-2408.13899","DOIUrl":"https://doi.org/arxiv-2408.13899","url":null,"abstract":"Graph-based indexes have been widely employed to accelerate approximate\u0000similarity search of high-dimensional vectors. However, the performance of\u0000graph indexes to answer different queries varies vastly, leading to an unstable\u0000quality of service for downstream applications. This necessitates an effective\u0000measure to test query hardness on graph indexes. Nonetheless, popular\u0000distance-based hardness measures like LID lose their effects due to the\u0000ignorance of the graph structure. In this paper, we propose $Steiner$-hardness,\u0000a novel connection-based graph-native query hardness measure. Specifically, we\u0000first propose a theoretical framework to analyze the minimum query effort on\u0000graph indexes and then define $Steiner$-hardness as the minimum effort on a\u0000representative graph. Moreover, we prove that our $Steiner$-hardness is highly\u0000relevant to the classical Directed $Steiner$ Tree (DST) problems. In this case,\u0000we design a novel algorithm to reduce our problem to DST problems and then\u0000leverage their solvers to help calculate $Steiner$-hardness efficiently.\u0000Compared with LID and other similar measures, $Steiner$-hardness shows a\u0000significantly better correlation with the actual query effort on various\u0000datasets. Additionally, an unbiased evaluation designed based on\u0000$Steiner$-hardness reveals new ranking results, indicating a meaningful\u0000direction for enhancing the robustness of graph indexes. This paper is accepted\u0000by PVLDB 2025.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"95 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223540","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Towards a Converged Relational-Graph Optimization Framework 建立一个融合关系图优化框架

arXiv - CS - Databases Pub Date : 2024-08-24 DOI: arxiv-2408.13480

Yunkai Lou, Longbin Lai, Bingqing Lyu, Yufan Yang, Xiaoli Zhou, Wenyuan Yu, Ying Zhang, Jingren Zhou

{"title":"Towards a Converged Relational-Graph Optimization Framework","authors":"Yunkai Lou, Longbin Lai, Bingqing Lyu, Yufan Yang, Xiaoli Zhou, Wenyuan Yu, Ying Zhang, Jingren Zhou","doi":"arxiv-2408.13480","DOIUrl":"https://doi.org/arxiv-2408.13480","url":null,"abstract":"The recent ISO SQL:2023 standard adopts SQL/PGQ (Property Graph Queries),\u0000facilitating graph-like querying within relational databases. This advancement,\u0000however, underscores a significant gap in how to effectively optimize SQL/PGQ\u0000queries within relational database systems. To address this gap, we extend the\u0000foundational SPJ(Select-Project-Join) queries to SPJM queries, which include an\u0000additional matching operator for representing graph pattern matching in\u0000SQL/PGQ. Although SPJM queries can be converted to SPJ queries and optimized\u0000using existing relational query optimizers, our analysis shows that such a\u0000graph-agnostic method fails to benefit from graph-specific optimization\u0000techniques found in the literature. To address this issue, we develop a\u0000converged relational-graph optimization framework called RelGo for optimizing\u0000SPJM queries, leveraging joint efforts from both relational and graph query\u0000optimizations. Using DuckDB as the underlying relational execution engine, our\u0000experiments show that RelGo can generate efficient execution plans for SPJM\u0000queries. On well-established benchmarks, these plans exhibit an average speedup\u0000of 21.90$times$ compared to those produced by the graph-agnostic optimizer.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"30 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Targeted Least Cardinality Candidate Key for Relational Databases 关系数据库的目标最小卡片性候选密钥

arXiv - CS - Databases Pub Date : 2024-08-24 DOI: arxiv-2408.13540

Vasileios Nakos, Hung Q. Ngo, Charalampos E. Tsourakakis

{"title":"Targeted Least Cardinality Candidate Key for Relational Databases","authors":"Vasileios Nakos, Hung Q. Ngo, Charalampos E. Tsourakakis","doi":"arxiv-2408.13540","DOIUrl":"https://doi.org/arxiv-2408.13540","url":null,"abstract":"Functional dependencies (FDs) are a central theme in databases, playing a\u0000major role in the design of database schemas and the optimization of queries.\u0000In this work, we introduce the {it targeted least cardinality candidate key\u0000problem} (TCAND). This problem is defined over a set of functional dependencies\u0000$F$ and a target variable set $T subseteq V$, and it aims to find the smallest\u0000set $X subseteq V$ such that the FD $X to T$ can be derived from $F$. The\u0000TCAND problem generalizes the well-known NP-hard problem of finding the least\u0000cardinality candidate key~cite{lucchesi1978candidate}, which has been\u0000previously demonstrated to be at least as difficult as the set cover problem. We present an integer programming (IP) formulation for the TCAND problem,\u0000analogous to a layered set cover problem. We analyze its linear programming\u0000(LP) relaxation from two perspectives: we propose two approximation algorithms\u0000and investigate the integrality gap. Our findings indicate that the\u0000approximation upper bounds for our algorithms are not significantly improvable\u0000through LP rounding, a notable distinction from the standard set cover problem.\u0000Additionally, we discover that a generalization of the TCAND problem is\u0000equivalent to a variant of the set cover problem, named red-blue set\u0000cover~cite{carr1999red}, which cannot be approximated within a sub-polynomial\u0000factor in polynomial time under plausible\u0000conjectures~cite{chlamtavc2023approximating}. Despite the extensive history\u0000surrounding the issue of identifying the least cardinality candidate key, our\u0000research contributes new theoretical insights, novel algorithms, and\u0000demonstrates that the general TCAND problem poses complexities beyond those\u0000encountered in the set cover problem.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"8 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223541","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GNN: Graph Neural Network and Large Language Model Based for Data Discovery GNN：基于图神经网络和大型语言模型的数据发现

arXiv - CS - Databases Pub Date : 2024-08-24 DOI: arxiv-2408.13609

Thomas Hoang

引用次数: 0

Cost-Aware Uncertainty Reduction in Schema Matching with GPT-4: The Prompt-Matcher Framework 利用 GPT-4 降低模式匹配中的不确定性：提示-匹配器框架

arXiv - CS - Databases Pub Date : 2024-08-24 DOI: arxiv-2408.14507

Longyu Feng, Huahang Li, Chen Jason Zhang

{"title":"Cost-Aware Uncertainty Reduction in Schema Matching with GPT-4: The Prompt-Matcher Framework","authors":"Longyu Feng, Huahang Li, Chen Jason Zhang","doi":"arxiv-2408.14507","DOIUrl":"https://doi.org/arxiv-2408.14507","url":null,"abstract":"Schema matching is the process of identifying correspondences between the\u0000elements of two given schemata, essential for database management systems, data\u0000integration, and data warehousing. The inherent uncertainty of current schema\u0000matching algorithms leads to the generation of a set of candidate matches.\u0000Storing these results necessitates the use of databases and systems capable of\u0000handling probabilistic queries. This complicates the querying process and\u0000increases the associated storage costs. Motivated by GPT-4 outstanding\u0000performance, we explore its potential to reduce uncertainty. Our proposal is to\u0000supplant the role of crowdworkers with GPT-4 for querying the set of candidate\u0000matches. To get more precise correspondence verification responses from GPT-4,\u0000We have crafted Semantic-match and Abbreviation-match prompt for GPT-4,\u0000achieving state-of-the-art results on two benchmark datasets DeepMDatasets 100%\u0000(+0.0) and Fabricated-Datasets 91.8% (+2.2) recall rate. To optimise budget\u0000utilisation, we have devised a cost-aware solution. Within the constraints of\u0000the budget, our solution delivers favourable outcomes with minimal time\u0000expenditure. We introduce a novel framework, Prompt-Matcher, to reduce the uncertainty in\u0000the process of integration of multiple automatic schema matching algorithms and\u0000the selection of complex parameterization. It assists users in diminishing the\u0000uncertainty associated with candidate schema match results and in optimally\u0000ranking the most promising matches. We formally define the Correspondence\u0000Selection Problem, aiming to optimise the revenue within the confines of the\u0000GPT-4 budget. We demonstrate that CSP is NP-Hard and propose an approximation\u0000algorithm with minimal time expenditure. Ultimately, we demonstrate the\u0000efficacy of Prompt-Matcher through rigorous experiments.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"13 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223538","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

BIPeC: A Combined Change-Point Analyzer to Identify Performance Regressions in Large-scale Database Systems BIPeC：用于识别大规模数据库系统性能回归的变化点组合分析器

arXiv - CS - Databases Pub Date : 2024-08-22 DOI: arxiv-2408.12414

Zhan Lyu, Thomas Bach, Yong Li, Nguyen Minh Le, Lars Hoemke

引用次数: 0

SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging SQL-GEN：通过合成数据和模型合并弥合文本到 SQL 的方言差距

arXiv - CS - Databases Pub Date : 2024-08-22 DOI: arxiv-2408.12733

Mohammadreza Pourreza, Ruoxi Sun, Hailong Li, Lesly Miculicich, Tomas Pfister, Sercan O. Arik

{"title":"SQL-GEN: Bridging the Dialect Gap for Text-to-SQL Via Synthetic Data And Model Merging","authors":"Mohammadreza Pourreza, Ruoxi Sun, Hailong Li, Lesly Miculicich, Tomas Pfister, Sercan O. Arik","doi":"arxiv-2408.12733","DOIUrl":"https://doi.org/arxiv-2408.12733","url":null,"abstract":"Text-to-SQL systems, which convert natural language queries into SQL\u0000commands, have seen significant progress primarily for the SQLite dialect.\u0000However, adapting these systems to other SQL dialects like BigQuery and\u0000PostgreSQL remains a challenge due to the diversity in SQL syntax and\u0000functions. We introduce SQL-GEN, a framework for generating high-quality\u0000dialect-specific synthetic data guided by dialect-specific tutorials, and\u0000demonstrate its effectiveness in creating training datasets for multiple\u0000dialects. Our approach significantly improves performance, by up to 20%, over\u0000previous methods and reduces the gap with large-scale human-annotated datasets.\u0000Moreover, combining our synthetic data with human-annotated data provides\u0000additional performance boosts of 3.3% to 5.6%. We also introduce a novel\u0000Mixture of Experts (MoE) initialization method that integrates dialect-specific\u0000models into a unified system by merging self-attention layers and initializing\u0000the gates with dialect-specific keywords, further enhancing performance across\u0000different SQL dialects.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"11 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223543","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Unlocking Sustainability Compliance: Characterizing the EU Taxonomy for Business Process Management 揭开可持续性合规的神秘面纱：描述欧盟业务流程管理分类标准

arXiv - CS - Databases Pub Date : 2024-08-21 DOI: arxiv-2408.11386

Finn Klessascheck, Stephan A. Fahrenkrog-Petersen, Jan Mendling, Luise Pufahl

{"title":"Unlocking Sustainability Compliance: Characterizing the EU Taxonomy for Business Process Management","authors":"Finn Klessascheck, Stephan A. Fahrenkrog-Petersen, Jan Mendling, Luise Pufahl","doi":"arxiv-2408.11386","DOIUrl":"https://doi.org/arxiv-2408.11386","url":null,"abstract":"To promote sustainable business practices, and to achieve climate neutrality\u0000by 2050, the EU has developed the taxonomy of sustainable activities, which\u0000describes when exactly business practices can be considered sustainable. While\u0000the taxonomy has only been recently established, progressively more companies\u0000will have to report how much of their revenue was created via sustainably\u0000executed business processes. To help companies prepare to assess whether their\u0000business processes comply with the constraints outlined in the taxonomy, we\u0000investigate in how far these criteria can be used for conformance checking,\u0000that is, assessing in a data-driven manner, whether business process executions\u0000adhere to regulatory constraints. For this, we develop a few-shot learning\u0000pipeline to characterize the constraints of the taxonomy with the help of an\u0000LLM as to the process dimensions they relate to. We find that many constraints\u0000of the taxonomy are useable for conformance checking, particularly in the\u0000sectors of energy, manufacturing, and transport. This will aid companies in\u0000preparing to monitor regulatory compliance with the taxonomy automatically, by\u0000characterizing what kind of information they need to extract, and by providing\u0000a better understanding of sectors where such an assessment is feasible and\u0000where it is not.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":"27 1","pages":""},"PeriodicalIF":0.0,"publicationDate":"2024-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142223548","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0