{"title":"Saving Money for Analytical Workloads in the Cloud","authors":"Tapan Srivastava, Raul Castro Fernandez","doi":"arxiv-2408.00253","DOIUrl":"https://doi.org/arxiv-2408.00253","url":null,"abstract":"As users migrate their analytical workloads to cloud databases, it is\u0000becoming just as important to reduce monetary costs as it is to optimize query\u0000runtime. In the cloud, a query is billed based on either its compute time or\u0000the amount of data it processes. We observe that analytical queries are either\u0000compute- or IO-bound and each query type executes cheaper in a different\u0000pricing model. We exploit this opportunity and propose methods to build cheaper\u0000execution plans across pricing models that complete within user-defined runtime\u0000constraints. We implement these methods and produce execution plans spanning\u0000multiple pricing models that reduce the monetary cost for workloads by as much\u0000as 56%. We reduce individual query costs by as much as 90%. The prices chosen\u0000by cloud vendors for cloud services also impact savings opportunities. To study\u0000this effect, we simulate our proposed methods with different cloud prices and\u0000observe that multi-cloud savings are robust to changes in cloud vendor prices.\u0000These results indicate the massive opportunity to save money by executing\u0000workloads across multiple pricing models.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883757","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Hybrid Querying Over Relational Databases and Large Language Models","authors":"Fuheng Zhao, Divyakant Agrawal, Amr El Abbadi","doi":"arxiv-2408.00884","DOIUrl":"https://doi.org/arxiv-2408.00884","url":null,"abstract":"Database queries traditionally operate under the closed-world assumption,\u0000providing no answers to questions that require information beyond the data\u0000stored in the database. Hybrid querying using SQL offers an alternative by\u0000integrating relational databases with large language models (LLMs) to answer\u0000beyond-database questions. In this paper, we present the first cross-domain\u0000benchmark, SWAN, containing 120 beyond-database questions over four real-world\u0000databases. To leverage state-of-the-art language models in addressing these\u0000complex questions in SWAN, we present, HQDL, a preliminary solution for hybrid\u0000querying, and also discuss potential future directions. Our evaluation\u0000demonstrates that HQDL using GPT-4 Turbo with few-shot prompts, achieves 40.0%\u0000in execution accuracy and 48.2% in data factuality. These results highlights\u0000both the potential and challenges for hybrid querying. We believe that our work\u0000will inspire further research in creating more efficient and accurate data\u0000systems that seamlessly integrate relational databases and large language\u0000models to address beyond-database questions.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141938150","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Detection of Anomalies in Temporal Knowledge Graphs with Interpretability","authors":"Jiasheng Zhang, Jie Shao, Rex Ying","doi":"arxiv-2408.00872","DOIUrl":"https://doi.org/arxiv-2408.00872","url":null,"abstract":"Temporal knowledge graphs (TKGs) are valuable resources for capturing\u0000evolving relationships among entities, yet they are often plagued by noise,\u0000necessitating robust anomaly detection mechanisms. Existing dynamic graph\u0000anomaly detection approaches struggle to capture the rich semantics introduced\u0000by node and edge categories within TKGs, while TKG embedding methods lack\u0000interpretability, undermining the credibility of anomaly detection. Moreover,\u0000these methods falter in adapting to pattern changes and semantic drifts\u0000resulting from knowledge updates. To tackle these challenges, we introduce\u0000AnoT, an efficient TKG summarization method tailored for interpretable online\u0000anomaly detection in TKGs. AnoT begins by summarizing a TKG into a novel rule\u0000graph, enabling flexible inference of complex patterns in TKGs. When new\u0000knowledge emerges, AnoT maps it onto a node in the rule graph and traverses the\u0000rule graph recursively to derive the anomaly score of the knowledge. The\u0000traversal yields reachable nodes that furnish interpretable evidence for the\u0000validity or the anomalous of the new knowledge. Overall, AnoT embodies a\u0000detector-updater-monitor architecture, encompassing a detector for offline TKG\u0000summarization and online scoring, an updater for real-time rule graph updates\u0000based on emerging knowledge, and a monitor for estimating the approximation\u0000error of the rule graph. Experimental results on four real-world datasets\u0000demonstrate that AnoT surpasses existing methods significantly in terms of\u0000accuracy and interoperability. All of the raw datasets and the implementation\u0000of AnoT are provided in https://github.com/zjs123/ANoT.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141938247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diego Arroyuelo, Fabrizio Barisione, Antonio Fariña, Adrián Gómez-Brandón, Gonzalo Navarro
{"title":"New Compressed Indices for Multijoins on Graph Databases","authors":"Diego Arroyuelo, Fabrizio Barisione, Antonio Fariña, Adrián Gómez-Brandón, Gonzalo Navarro","doi":"arxiv-2408.00558","DOIUrl":"https://doi.org/arxiv-2408.00558","url":null,"abstract":"A recent surprising result in the implementation of worst-case-optimal (wco)\u0000multijoins in graph databases (specifically, basic graph patterns) is that they\u0000can be supported on graph representations that take even less space than a\u0000plain representation, and orders of magnitude less space than classical\u0000indices, while offering comparable performance. In this paper we uncover a wide\u0000set of new wco space-time tradeoffs: we (1) introduce new compact indices that\u0000handle multijoins in wco time, and (2) combine them with new query resolution\u0000strategies that offer better times in practice. As a result, we improve the\u0000average query times of current compact representations by a factor of up to 13\u0000to produce the first 1000 results, and using twice their space, reduce their\u0000total average query time by a factor of 2. Our experiments suggest that there\u0000is more room for improvement in terms of generating better query plans for\u0000multijoins.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141883825","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xiny Pan, Daniel Hernández, Philipp Seifer, Ralf Lämmel, Steffen Staab
{"title":"eSPARQL: Representing and Reconciling Agnostic and Atheistic Beliefs in RDF-star Knowledge Graphs","authors":"Xiny Pan, Daniel Hernández, Philipp Seifer, Ralf Lämmel, Steffen Staab","doi":"arxiv-2407.21483","DOIUrl":"https://doi.org/arxiv-2407.21483","url":null,"abstract":"Over the past few years, we have seen the emergence of large knowledge graphs\u0000combining information from multiple sources. Sometimes, this information is\u0000provided in the form of assertions about other assertions, defining contexts\u0000where assertions are valid. A recent extension to RDF which admits statements\u0000over statements, called RDF-star, is in revision to become a W3C standard.\u0000However, there is no proposal for a semantics of these RDF-star statements nor\u0000a built-in facility to operate over them. In this paper, we propose a query\u0000language for epistemic RDF-star metadata based on a four-valued logic, called\u0000eSPARQL. Our proposed query language extends SPARQL-star, the query language\u0000for RDF-star, with a new type of FROM clause to facilitate operating with\u0000multiple and sometimes conflicting beliefs. We show that the proposed query\u0000language can express four use case queries, including the following features:\u0000(i) querying the belief of an individual, (ii) the aggregating of beliefs,\u0000(iii) querying who is conflicting with somebody, and (iv) beliefs about beliefs\u0000(i.e., nesting of beliefs).","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Complete Approximations of Incomplete Queries","authors":"Julien Corman, Werner Nutt, Ognjen Savković","doi":"arxiv-2407.20932","DOIUrl":"https://doi.org/arxiv-2407.20932","url":null,"abstract":"This paper studies the completeness of conjunctive queries over a partially\u0000complete database and the approximation of incomplete queries. Given a query\u0000and a set of completeness rules (a special kind of tuple generating\u0000dependencies) that specify which parts of the database are complete, we\u0000investigate whether the query can be fully answered, as if all data were\u0000available. If not, we explore reformulating the query into either Maximal\u0000Complete Specializations (MCSs) or the (unique up to equivalence) Minimal\u0000Complete Generalization (MCG) that can be fully answered, that is, the best\u0000complete approximations of the query from below or above in the sense of query\u0000containment. We show that the MSG can be characterized as the least fixed-point\u0000of a monotonic operator in a preorder. Then, we show that an MCS can be\u0000computed by recursive backward application of completeness rules. We study the\u0000complexity of both problems and discuss implementation techniques that rely on\u0000an ASP and Prolog engines, respectively.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868195","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diego Figueira, S. Krishna, Om Swostik Mishra, Anantha Padmanabha
{"title":"Boundedness for Unions of Conjunctive Regular Path Queries over Simple Regular Expressions","authors":"Diego Figueira, S. Krishna, Om Swostik Mishra, Anantha Padmanabha","doi":"arxiv-2407.20782","DOIUrl":"https://doi.org/arxiv-2407.20782","url":null,"abstract":"The problem of checking whether a recursive query can be rewritten as query\u0000without recursion is a fundamental reasoning task, known as the boundedness\u0000problem. Here we study the boundedness problem for Unions of Conjunctive\u0000Regular Path Queries (UCRPQs), a navigational query language extensively used\u0000in ontology and graph database querying. The boundedness problem for UCRPQs is\u0000ExpSpace-complete. Here we focus our analysis on UCRPQs using simple regular\u0000expressions, which are of high practical relevance and enjoy a lower reasoning\u0000complexity. We show that the complexity for the boundedness problem for this\u0000UCRPQs fragment is $Pi^P_2$-complete, and that an equivalent bounded query can\u0000be produced in polynomial time whenever possible. When the query turns out to\u0000be unbounded, we also study the task of finding an equivalent maximally bounded\u0000query, which we show to be feasible in $Pi^P_2$. As a side result of\u0000independent interest stemming from our developments, we study a notion of\u0000succinct finite automata and prove that its membership problem is in NP.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Minxiao Chen, Haitao Yuan, Nan Jiang, Zhifeng Bao, Shangguang Wang
{"title":"Urban Traffic Accident Risk Prediction Revisited: Regionality, Proximity, Similarity and Sparsity","authors":"Minxiao Chen, Haitao Yuan, Nan Jiang, Zhifeng Bao, Shangguang Wang","doi":"arxiv-2407.19668","DOIUrl":"https://doi.org/arxiv-2407.19668","url":null,"abstract":"Traffic accidents pose a significant risk to human health and property\u0000safety. Therefore, to prevent traffic accidents, predicting their risks has\u0000garnered growing interest. We argue that a desired prediction solution should\u0000demonstrate resilience to the complexity of traffic accidents. In particular,\u0000it should adequately consider the regional background, accurately capture both\u0000spatial proximity and semantic similarity, and effectively address the sparsity\u0000of traffic accidents. However, these factors are often overlooked or difficult\u0000to incorporate. In this paper, we propose a novel multi-granularity\u0000hierarchical spatio-temporal network. Initially, we innovate by incorporating\u0000remote sensing data, facilitating the creation of hierarchical\u0000multi-granularity structure and the comprehension of regional background. We\u0000construct multiple high-level risk prediction tasks to enhance model's ability\u0000to cope with sparsity. Subsequently, to capture both spatial proximity and\u0000semantic similarity, region feature and multi-view graph undergo encoding\u0000processes to distill effective representations. Additionally, we propose\u0000message passing and adaptive temporal attention module that bridges different\u0000granularities and dynamically captures time correlations inherent in traffic\u0000accident patterns. At last, a multivariate hierarchical loss function is\u0000devised considering the complexity of the prediction purpose. Extensive\u0000experiments on two real datasets verify the superiority of our model against\u0000the state-of-the-art methods.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868198","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lars Vogt, Marcel Konrad, Kheir Eddine Farfar, Manuel Prinz, Allard Oelen
{"title":"Rosetta Statements: Lowering the Barrier for Semantic Parsing and Increasing the Cognitive Interoperability of Knowledge Graphs","authors":"Lars Vogt, Marcel Konrad, Kheir Eddine Farfar, Manuel Prinz, Allard Oelen","doi":"arxiv-2407.20007","DOIUrl":"https://doi.org/arxiv-2407.20007","url":null,"abstract":"Machines need data and metadata to be machine-actionable and FAIR (findable,\u0000accessible, interoperable, reusable) to manage increasing data volumes.\u0000Knowledge graphs and ontologies are key to this, but their use is hampered by\u0000high access barriers due to required prior knowledge in semantics and data\u0000modelling. The Rosetta Statement approach proposes modeling English natural\u0000language statements instead of a mind-independent reality. We propose a\u0000metamodel for creating semantic schema patterns for simple statement types. The\u0000approach supports versioning of statements and provides a detailed editing\u0000history. Each Rosetta Statement pattern has a dynamic label for displaying\u0000statements as natural language sentences. Implemented in the Open Research\u0000Knowledge Graph (ORKG) as a use case, this approach allows domain experts to\u0000define data schema patterns without needing semantic knowledge. Future plans\u0000include combining Rosetta Statements with semantic units to organize ORKG into\u0000meaningful subgraphs, improving usability. A search interface for querying\u0000statements without needing SPARQL or Cypher knowledge is also planned, along\u0000with tools for data entry and display using Large Language Models and NLP. The\u0000Rosetta Statement metamodel supports a two-step knowledge graph construction\u0000procedure. Domain experts can model semantic content without support from\u0000ontology engineers, lowering entry barriers and increasing cognitive\u0000interoperability. The second level involves developing semantic graph patterns\u0000for reasoning, requiring collaboration with ontology engineers.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868203","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Limitations of Validity Intervals in Data Freshness Management","authors":"Kyoung-Don Kang","doi":"arxiv-2407.20431","DOIUrl":"https://doi.org/arxiv-2407.20431","url":null,"abstract":"In data-intensive real-time applications, such as smart transportation and\u0000manufacturing, ensuring data freshness is essential, as using obsolete data can\u0000lead to negative outcomes. Validity intervals serve as the standard means to\u0000specify freshness requirements in real-time databases. In this paper, we bring\u0000attention to significant drawbacks of validity intervals that have largely been\u0000unnoticed and introduce a new definition of data freshness, while discussing\u0000future research directions to address these limitations.","PeriodicalId":501123,"journal":{"name":"arXiv - CS - Databases","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2024-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141868224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}