ACM Transactions on Database Systems最新文献

Automated Category Tree Construction: Hardness Bounds and Algorithms 自动分类树构建：硬度界限和算法

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2024-05-09 DOI: 10.1145/3664283

Shay Gershtein, Uri Avron, Ido Guy, Tova Milo, Slava Novgorodov

{"title":"Automated Category Tree Construction: Hardness Bounds and Algorithms","authors":"Shay Gershtein, Uri Avron, Ido Guy, Tova Milo, Slava Novgorodov","doi":"10.1145/3664283","DOIUrl":"https://doi.org/10.1145/3664283","url":null,"abstract":"Category trees, or taxonomies, are rooted trees where each node, called a category, corresponds to a set of related items. The construction of taxonomies has been studied in various domains, including e-commerce, document management, and question answering. Multiple algorithms for automating construction have been proposed, employing a variety of clustering approaches and crowdsourcing. However, no formal model to capture such categorization problems has been devised, and their complexity has not been studied. To address this, we propose in this work a combinatorial model that captures many practical settings and show that the aforementioned empirical approach has been warranted, as we prove strong inapproximability bounds for various problem variants and special cases when the goal is to produce a categorization of the maximum utility. In our model, the input is a set of n weighted item sets that the tree would ideally contain as categories. Each category, rather than perfectly match the corresponding input set, is allowed to exceed a given threshold for a given similarity function. The goal is to produce a tree that maximizes the total weight of the sets for which it contains a matching category. A key parameter is an upper bound on the number of categories an item may belong to, which produces the hardness of the problem, as initially each item may be contained in an arbitrary number of input sets. For this model, we prove inapproximability bounds, of order (tilde{Theta }(sqrt {n}) ) or (tilde{Theta }(n) ), for various problem variants and special cases, loosely justifying the aforementioned heuristic approach. Our work includes reductions based on parameterized randomized constructions that highlight how various problem parameters and properties of the input may affect the hardness. Moreover, for the special case where the category must be identical to the corresponding input set, we devise an algorithm whose approximation guarantee depends solely on a more granular parameter, allowing improved worst-case guarantees, as well as the application of practical exact solvers. We further provide efficient algorithms with much improved approximation guarantees for practical special cases where the cardinalities of the input sets or the number of input sets each items belongs to are not too large. Finally, we also generalize our results to DAG-based and non-hierarchical categorization.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"2016 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140942057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Database Repairing with Soft Functional Dependencies 利用软功能依赖性修复数据库

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2024-03-04 DOI: 10.1145/3651156

Nofar Carmeli, Martin Grohe, Benny Kimelfeld, Ester Livshits, Muhammad Tibi

{"title":"Database Repairing with Soft Functional Dependencies","authors":"Nofar Carmeli, Martin Grohe, Benny Kimelfeld, Ester Livshits, Muhammad Tibi","doi":"10.1145/3651156","DOIUrl":"https://doi.org/10.1145/3651156","url":null,"abstract":"A common interpretation of soft constraints penalizes the database for every violation of every constraint, where the penalty is the cost (weight) of the constraint. A computational challenge is that of finding an optimal subset: a collection of database tuples that minimizes the total penalty when each tuple has a cost of being excluded. When the constraints are strict (i.e., have an infinite cost), this subset is a “cardinality repair” of an inconsistent database; in soft interpretations, this subset corresponds to a “most probable world” of a probabilistic database, a “most likely intention” of a probabilistic unclean database, and so on. Within the class of functional dependencies, the complexity of finding a cardinality repair is thoroughly understood. Yet, very little is known about the complexity of finding an optimal subset for the more general soft semantics. The work described in this manuscript makes significant progress in that direction. In addition to general insights about the hardness and approximability of the problem, we present algorithms for two special cases (and some generalizations thereof): a single functional dependency, and a bipartite matching. The latter is the problem of finding an optimal “almost matching” of a bipartite graph where a penalty is paid for every lost edge and every violation of monogamy. For these special cases, we also investigate the complexity of additional computational tasks that arise when the soft constraints are used as a means to represent a probabilistic database via a factor graph, as in the case of a probabilistic unclean database.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"26 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140037089","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Sharing Queries with Nonequivalent User-Defined Aggregate Functions 使用非等价用户定义的聚合函数共享查询

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2024-02-24 DOI: 10.1145/3649133

Chao Zhang, Farouk Toumani

{"title":"Sharing Queries with Nonequivalent User-Defined Aggregate Functions","authors":"Chao Zhang, Farouk Toumani","doi":"10.1145/3649133","DOIUrl":"https://doi.org/10.1145/3649133","url":null,"abstract":"This paper presents <sans-serif>SUDAF</sans-serif>, a declarative framework that allows users to write UDAF (User-Defined Aggregate Function) as mathematical expressions and use them in SQL statements. <sans-serif>SUDAF</sans-serif> rewrites partial aggregates of UDAFs using built-in aggregate functions and supports efficient dynamic caching and reusing of partial aggregates. Our experiments show that rewriting UDAFs using built-in functions can significantly speed up queries with UDAFs, and the proposed sharing approach can yield up to two orders of magnitude improvement in query execution time. The paper studies also an extension of <sans-serif>SUDAF</sans-serif> to support sharing partial results between arbitrary queries with UDAFs. We show a connection with the problem of query rewriting using views and introduce a new class of rewritings, called <sans-serif>SUDAF</sans-serif> rewritings, which enables to use views that have aggregate functions different from the ones used in the input query. We investigate the underlying rewriting-checking and rewriting-existing problem. Our main technical result is a reduction of these problems to respectively rewriting-checking and rewriting-existing of the so-called aggregate candidates, a class of rewritings that has been deeply investigated in the literature.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"170 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-02-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139968559","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A family of centrality measures for graph data based on subgraphs 基于子图的图数据中心性度量系列

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2024-02-23 DOI: 10.1145/3649134

Sebastián Bugedo, Cristian Riveros, Jorge Salas

{"title":"A family of centrality measures for graph data based on subgraphs","authors":"Sebastián Bugedo, Cristian Riveros, Jorge Salas","doi":"10.1145/3649134","DOIUrl":"https://doi.org/10.1145/3649134","url":null,"abstract":"We present the theoretical foundations and first experimental study of a new approach in centrality measures for graph data. The main principle is straightforward: the more relevant subgraphs around a vertex, the more central it is in the network. We formalize the notion of “relevant subgraphs” by choosing a family of subgraphs that, given a graph G and a vertex v, assigns a subset of connected subgraphs of G that contains v. Any of such families defines a measure of centrality by counting the number of subgraphs assigned to the vertex, i.e., a vertex will be more important for the network if it belongs to more subgraphs in the family. We show several examples of this approach. In particular, we propose the All-Subgraphs (All-Trees) centrality, a centrality measure that considers every subgraph (tree). We study fundamental properties over families of subgraphs that guarantee desirable properties over the centrality measure. Interestingly, All-Subgraphs and All-Trees satisfy all these properties, showing their robustness as centrality notions. To conclude the theoretical analysis, we study the computational complexity of counting certain families of subgraphs and show a linear time algorithm to compute the All-Subgraphs and All-Trees centrality for graphs with bounded treewidth. Finally, we implemented these algorithms and computed these measures over more than one hundred real-world networks. With this data, we present an empirical comparison between well-known centrality measures and those proposed in this work.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"60 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-02-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139950700","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

GraphZeppelin: How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive) GraphZeppelin：如何查找连接的组件（即使图形密集、动态且庞大）

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2024-02-20 DOI: 10.1145/3643846

David Tench, Evan West, Victor Zhang, Michael A. Bender, Abiyaz Chowdhury, Daniel Delayo, J. Ahmed Dellas, Martín Farach-Colton, Tyler Seip, Kenny Zhang

{"title":"GraphZeppelin: How to Find Connected Components (Even When Graphs Are Dense, Dynamic, and Massive)","authors":"David Tench, Evan West, Victor Zhang, Michael A. Bender, Abiyaz Chowdhury, Daniel Delayo, J. Ahmed Dellas, Martín Farach-Colton, Tyler Seip, Kenny Zhang","doi":"10.1145/3643846","DOIUrl":"https://doi.org/10.1145/3643846","url":null,"abstract":"Finding the connected components of a graph is a fundamental problem with uses throughout computer science and engineering. The task of computing connected components becomes more difficult when graphs are very large, or when they are dynamic, meaning the edge set changes over time subject to a stream of edge insertions and deletions. A natural approach to computing the connected components problem on a large, dynamic graph stream is to buy enough RAM to store the entire graph. However, the requirement that the graph fit in RAM is an inherent limitation of this approach and is prohibitive for very large graphs. Thus, there is an unmet need for systems that can process dense dynamic graphs, especially when those graphs are larger than available RAM. We present a new high-performance streaming graph-processing system for computing the connected components of a graph. This system, which we call GraphZeppelin, uses new linear sketching data structures (CubeSketch) to solve the streaming connected components problem and as a result requires space asymptotically smaller than the space required for an lossless representation of the graph. GraphZeppelin is optimized for massive dense graphs: GraphZeppelin can process millions of edge updates (both insertions and deletions) per second, even when the underlying graph is far too large to fit in available RAM. As a result GraphZeppelin vastly increases the scale of graphs that can be processed.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"72 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-02-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139923689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Supporting Better Insights of Data Science Pipelines with Fine-grained Provenance 利用细粒度证明支持更好地洞察数据科学管道

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2024-02-09 DOI: 10.1145/3644385

Adriane Chapman, Luca Lauro, Paolo Missier, Riccardo Torlone

{"title":"Supporting Better Insights of Data Science Pipelines with Fine-grained Provenance","authors":"Adriane Chapman, Luca Lauro, Paolo Missier, Riccardo Torlone","doi":"10.1145/3644385","DOIUrl":"https://doi.org/10.1145/3644385","url":null,"abstract":"Successful data-driven science requires complex data engineering pipelines to clean, transform, and alter data in preparation for machine learning, and robust results can only be achieved when each step in the pipeline can be justified, and its effect on the data explained. In this framework, we aim to provide data scientists with facilities to gain an in-depth understanding of how each step in the pipeline affects the data, from the raw input to training sets ready to be used for learning. Starting from an extensible set of data preparation operators commonly used within a data science setting, in this work we present a provenance management infrastructure for generating, storing, and querying very granular accounts of data transformations, at the level of individual elements within datasets whenever possible. Then, from the formal definition of a core set of data science preprocessing operators, we derive a provenance semantics embodied by a collection of templates expressed in PROV, a standard model for data provenance. Using those templates as a reference, our provenance generation algorithm generalises to any operator with observable input/output pairs. We provide a prototype implementation of an application-level provenance capture library to produce, in a semi-automatic way, complete provenance documents that account for the entire pipeline. We report on the ability of that reference implementation to capture provenance in real ML benchmark pipelines and over TCP-DI synthetic data. We finally show how the collected provenance can be used to answer a suite of provenance benchmark queries that underpin some common pipeline inspection questions, as expressed on the Data Science Stack Exchange.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"107 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139757527","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

The Ring: Worst-Case Optimal Joins in Graph Databases using (Almost) No Extra Space 环：利用（几乎）无额外空间实现图数据库中的最坏情况最优连接

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2024-02-08 DOI: 10.1145/3644824

Diego Arroyuelo, Adrián Gómez-Brandón, Aidan Hogan, Gonzalo Navarro, Juan Reutter, Javiel Rojas-Ledesma, Adrián Soto

{"title":"The Ring: Worst-Case Optimal Joins in Graph Databases using (Almost) No Extra Space","authors":"Diego Arroyuelo, Adrián Gómez-Brandón, Aidan Hogan, Gonzalo Navarro, Juan Reutter, Javiel Rojas-Ledesma, Adrián Soto","doi":"10.1145/3644824","DOIUrl":"https://doi.org/10.1145/3644824","url":null,"abstract":"We present an indexing scheme for triple-based graphs that supports join queries in worst-case optimal (wco) time within compact space. This scheme, called a ring, regards each triple as a cyclic string of length 3. Each rotation of the triples is lexicographically sorted and the values of the last attribute are stored as a column, so we obtain the order of the next column by stably re-sorting the triples by its attribute. We show that, by representing the columns with a compact data structure called a wavelet tree, this ordering enables forward and backward navigation between columns without needing pointers. These wavelet trees further support wco join algorithms and cardinality estimations for query planning. While traditional data structures such as B-Trees, tries, etc., require 6 index orders to support all possible wco joins over triples, we can use one ring to index them all. This ring replaces the graph and uses only sublinear extra space, thus supporting wco joins in almost no space beyond storing the graph itself. Experiments querying a large graph (Wikidata) in memory show that the ring offers nearly the best overall query times while using only a small fraction of the space required by several state-of-the-art approaches. We then turn our attention to some theoretical results for indexing tables of arity d higher than 3 in such a way that supports wco joins. While a single ring of length d no longer suffices to cover all d! orders, we need much fewer rings to index them all: O(2d) rings with a small constant. For example, we need 5 rings instead of 120 orders for d = 5. We show that our rings become a particular case of what we dub order graphs, whose nodes are attribute orders and where stably sorting by some attribute leads us from an order to another, thereby inducing an edge labeled by the attribute. The index is then the set of columns associated with the edges, and a set of rings is just one possible graph shape. We show that other shapes, like for example a single ring instead of several ones of length d, can lead us to even smaller indexes, and that other more general shapes are also possible. For example, we handle d = 5 attributes within space equivalent to 4 rings.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"28 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139757456","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identifying the Root Causes of DBMS Suboptimality 找出 DBMS 欠优化的根本原因

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2024-01-10 DOI: 10.1145/3636425

Sabah Currim, Richard T. Snodgrass, Young-Kyoon Suh

{"title":"Identifying the Root Causes of DBMS Suboptimality","authors":"Sabah Currim, Richard T. Snodgrass, Young-Kyoon Suh","doi":"10.1145/3636425","DOIUrl":"https://doi.org/10.1145/3636425","url":null,"abstract":"The query optimization phase within a database management system (DBMS) ostensibly finds the fastest query execution plan from a potentially large set of enumerated plans, all of which correctly compute the same result of the specified query. Sometimes the cost-based optimizer selects a slower plan, for a variety of reasons. Previous work has focused on increasing the performance of specific components, often a single operator, within an individual DBMS. However, that does not address the fundamental question: from where does this suboptimality arise, across DBMSes generally? In particular, the contribution of each of many possible factors to DBMS suboptimality is currently unknown. To identify the root causes of DBMS suboptimality, we first introduce the notion of empirical suboptimality of a query plan chosen by the DBMS, indicated by the existence of a query plan that performs more efficiently than the chosen plan, for the same query. A crucial aspect is that this can be measured externally to the DBMS, and thus does not require access to its source code. We then propose a novel predictive model to explain the relationship between various factors in query optimization and empirical suboptimality. Our model associates suboptimality with the factors of complexity of the schema, of the underlying data on which the query is evaluated, of the query itself, and of the DBMS optimizer. The model also characterizes concomitant interactions among these factors. This model induces a number of specific hypotheses that were tested on multiple DBMSes. We performed a series of experiments that examined the plans for thousands of queries run on four popular DBMSes. We tested the model on over a million of these query executions, using correlational analysis, regression analysis, and causal analysis, specifically Structural Equation Modeling (SEM). We observed that the dependent construct of empirical suboptimality prevalence correlates positively with nine specific constructs characterizing four identified factors that explain in concert much of the variance of suboptimality of two extensive benchmarks, across these disparate DBMSes. This predictive model shows that it is the common aspects of these DBMSes that predict suboptimality, not the particulars embedded in the inordinate complexity of each of these DBMSes. This paper thus provides a new methodology to study mature query optimizers, identifies underlying DBMS-independent causes for the observed suboptimality, and quantifies the relative contribution of each of these causes to the observed suboptimality. This work thus provides a roadmap for fundamental improvements of cost-based query optimizers.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"69 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139409660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Linking Entities across Relations and Graphs 跨关系和图表链接实体

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2024-01-03 DOI: 10.1145/3639363

Wenfei Fan, Ping Lu, Kehan Pang, Ruochun Jin

{"title":"Linking Entities across Relations and Graphs","authors":"Wenfei Fan, Ping Lu, Kehan Pang, Ruochun Jin","doi":"10.1145/3639363","DOIUrl":"https://doi.org/10.1145/3639363","url":null,"abstract":"This paper proposes a notion of parametric simulation to link entities across a relational database (mathcal {D} ) and a graph G. Taking functions and thresholds for measuring vertex closeness, path associations and important properties as parameters, parametric simulation identifies tuples t in (mathcal {D} ) and vertices v in G that refer to the same real-world entity, based on both topological and semantic matching. We develop machine learning methods to learn the parameter functions and thresholds. We show that parametric simulation is in quadratic-time, by providing such an algorithm. Moreover, we develop an incremental algorithm for parametric simulation; we show that the incremental algorithm is bounded relative to its batch counterpart, i.e., it incurs the minimum cost for incrementalizing the batch algorithm. Putting these together, we develop HER, a parallel system to check whether (t, v) makes a match, find all vertex matches of t in G, and compute all matches across (mathcal {D} ) and G, all in quadratic-time; moreover, HER supports incremental computation of these in response to updates to (mathcal {D} ) and G. Using real-life and synthetic data, we empirically verify that HER is accurate with F-measure of 0.94 on average, and is able to scale with database (mathcal {D} ) and graph G for both batch and incremental computations.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"6 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2024-01-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139102300","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Fast Parallel Hypertree Decompositions in Logarithmic Recursion Depth 对数递归深度下的快速并行超树分解

IF 1.8 2区计算机科学

ACM Transactions on Database Systems Pub Date : 2023-12-30 DOI: 10.1145/3638758

Georg Gottlob, Matthias Lanzinger, Cem Okulmus, Reinhard Pichler

{"title":"Fast Parallel Hypertree Decompositions in Logarithmic Recursion Depth","authors":"Georg Gottlob, Matthias Lanzinger, Cem Okulmus, Reinhard Pichler","doi":"10.1145/3638758","DOIUrl":"https://doi.org/10.1145/3638758","url":null,"abstract":"Various classic reasoning problems with natural hypergraph representations are known to be tractable if a hypertree decomposition (HD) of low width exists. The resulting algorithms are attractive for practical use in fields like databases and constraint satisfaction. However, algorithmic use of HDs relies on the difficult task of first computing a decomposition of the hypergraph underlying a given problem instance, which is then used to guide the algorithm for this particular instance. The performance of purely sequential methods for computing HDs is inherently limited, yet the problem is, theoretically, amenable to parallelisation. In this paper we propose the first algorithm for computing hypertree decompositions that is well-suited for parallelisation. The newly proposed algorithm log-k-decomp requires only a logarithmic number of recursion levels and additionally allows for highly parallelised pruning of the search space by restriction to so-called balanced separators. We provide a detailed experimental evaluation over the HyperBench benchmark and demonstrate that log-k-decomp outperforms the current state of the art significantly.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"52 1","pages":""},"PeriodicalIF":1.8,"publicationDate":"2023-12-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139065497","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0