Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory最新文献_第2页

Constant-delay enumeration for SLP-compressed documents slp压缩文档的恒定延迟枚举

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory Pub Date : 2022-09-25 DOI: 10.48550/arXiv.2209.12301

Martin Muñoz, Cristian Riveros

{"title":"Constant-delay enumeration for SLP-compressed documents","authors":"Martin Muñoz, Cristian Riveros","doi":"10.48550/arXiv.2209.12301","DOIUrl":"https://doi.org/10.48550/arXiv.2209.12301","url":null,"abstract":"We study the problem of enumerating results from a query over a compressed document. The model we use for compression are straight-line programs (SLPs), which are defined by a context-free grammar that produces a single string. For our queries, we use a model called Annotated Automata, an extension of regular automata that allows annotations on letters. This model extends the notion of Regular Spanners as it allows arbitrarily long outputs. Our main result is an algorithm that evaluates such a query by enumerating all results with output-linear delay after a preprocessing phase which takes linear time on the size of the SLP, and cubic time over the size of the automaton. This is an improvement over Schmid and Schweikardt's result, which, with the same preprocessing time, enumerates with a delay that is logarithmic on the size of the uncompressed document. We achieve this through a persistent data structure named Enumerable Compact Sets with Shifts which guarantees output-linear delay under certain restrictions. These results imply constant-delay enumeration algorithms in the context of regular spanners. Further, we use an extension of annotated automata which utilizes succinctly encoded annotations to save an exponential factor from previous results that dealt with constant-delay enumeration over vset automata. Lastly, we extend our results in the same fashion Schmid and Schweikardt did to allow complex document editing while maintaining the constant delay guarantee.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"21 1","pages":"7:1-7:17"},"PeriodicalIF":0.0,"publicationDate":"2022-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84960906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Conjunctive Queries with Free Access Patterns Under Updates 更新下具有自由访问模式的连接查询

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory Pub Date : 2022-06-17 DOI: 10.4230/LIPIcs.ICDT.2023.17

A. Kara, M. Nikolic, Dan Olteanu, Haozhe Zhang

引用次数: 5

Absolute Expressiveness of Subgraph-Based Centrality Measures 基于子图的中心性测度的绝对表达性

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory Pub Date : 2022-06-13 DOI: 10.4230/LIPIcs.ICDT.2023.9

Andreas Pieris, J. Salas

{"title":"Absolute Expressiveness of Subgraph-Based Centrality Measures","authors":"Andreas Pieris, J. Salas","doi":"10.4230/LIPIcs.ICDT.2023.9","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2023.9","url":null,"abstract":"In graph-based applications, a common task is to pinpoint the most important or ``central'' vertex in a (directed or undirected) graph, or rank the vertices of a graph according to their importance. To this end, a plethora of so-called centrality measures have been proposed in the literature. Such measures assess which vertices in a graph are the most important ones by analyzing the structure of the underlying graph. A family of centrality measures that are suited for graph databases has been recently proposed by relying on the following simple principle: the importance of a vertex in a graph is relative to the number of ``relevant'' connected subgraphs surrounding it; we refer to the members of this family as subgraph-based centrality measures. Although it has been shown that such measures enjoy several favourable properties, their absolute expressiveness remains largely unexplored. The goal of this work is to precisely characterize the absolute expressiveness of the family of subgraph-based centrality measures by considering both directed and undirected graphs. To this end, we characterize when an arbitrary centrality measure is a subgraph-based one, or a subgraph-based measure relative to the induced ranking. These characterizations provide us with technical tools that allow us to determine whether well-established centrality measures are subgraph-based. Such a classification, apart from being interesting in its own right, gives useful insights on the structural similarities and differences among existing centrality measures.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"33 1","pages":"9:1-9:18"},"PeriodicalIF":0.0,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76970000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Probabilistic Query Evaluation with Bag Semantics 基于袋语义的概率查询求值

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory Pub Date : 2022-01-27 DOI: 10.4230/LIPIcs.ICDT.2023.20

Martin Grohe, P. Lindner, Christoph Standke

{"title":"Probabilistic Query Evaluation with Bag Semantics","authors":"Martin Grohe, P. Lindner, Christoph Standke","doi":"10.4230/LIPIcs.ICDT.2023.20","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2023.20","url":null,"abstract":"We study the complexity of evaluating queries on probabilistic databases under bag semantics. We focus on self-join free conjunctive queries, and probabilistic databases where occurrences of different facts are independent, which is the natural generalization of tuple-independent probabilistic databases to the bag semantics setting. For set semantics, the data complexity of this problem is well understood, even for the more general class of unions of conjunctive queries: it is either in polynomial time, or #P-hard, depending on the query (Dalvi&Suciu, JACM 2012). A reasonably general model of bag probabilistic databases may have unbounded multiplicities. In this case, the probabilistic database is no longer finite, and a careful treatment of representation mechanisms is required. Moreover, the answer to a Boolean query is a probability distribution over (possibly all) non-negative integers, rather than a probability distribution over { true, false }. Therefore, we discuss two flavors of probabilistic query evaluation: computing expectations of answer tuple multiplicities, and computing the probability that a tuple is contained in the answer at most k times for some parameter k. Subject to mild technical assumptions on the representation systems, it turns out that expectations are easy to compute, even for unions of conjunctive queries. For query answer probabilities, we obtain a dichotomy between solvability in polynomial time and #P-hardness for self-join free conjunctive queries.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"28 1","pages":"20:1-20:19"},"PeriodicalIF":0.0,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73106414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Improved Approximation and Scalability for Fair Max-Min Diversification 公平最大最小分散的改进逼近和可扩展性

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory Pub Date : 2022-01-18 DOI: 10.4230/LIPIcs.ICDT.2022.7

Raghavendra Addanki, A. Mcgregor, A. Meliou, Zafeiria Moumoulidou

{"title":"Improved Approximation and Scalability for Fair Max-Min Diversification","authors":"Raghavendra Addanki, A. Mcgregor, A. Meliou, Zafeiria Moumoulidou","doi":"10.4230/LIPIcs.ICDT.2022.7","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2022.7","url":null,"abstract":"Given an $n$-point metric space $(mathcal{X},d)$ where each point belongs to one of $m=O(1)$ different categories or groups and a set of integers $k_1, ldots, k_m$, the fair Max-Min diversification problem is to select $k_i$ points belonging to category $iin [m]$, such that the minimum pairwise distance between selected points is maximized. The problem was introduced by Moumoulidou et al. [ICDT 2021] and is motivated by the need to down-sample large data sets in various applications so that the derived sample achieves a balance over diversity, i.e., the minimum distance between a pair of selected points, and fairness, i.e., ensuring enough points of each category are included. We prove the following results: 1. We first consider general metric spaces. We present a randomized polynomial time algorithm that returns a factor $2$-approximation to the diversity but only satisfies the fairness constraints in expectation. Building upon this result, we present a $6$-approximation that is guaranteed to satisfy the fairness constraints up to a factor $1-epsilon$ for any constant $epsilon$. We also present a linear time algorithm returning an $m+1$ approximation with exact fairness. The best previous result was a $3m-1$ approximation. 2. We then focus on Euclidean metrics. We first show that the problem can be solved exactly in one dimension. For constant dimensions, categories and any constant $epsilon>0$, we present a $1+epsilon$ approximation algorithm that runs in $O(nk) + 2^{O(k)}$ time where $k=k_1+ldots+k_m$. We can improve the running time to $O(nk)+ poly(k)$ at the expense of only picking $(1-epsilon) k_i$ points from category $iin [m]$. Finally, we present algorithms suitable to processing massive data sets including single-pass data stream algorithms and composable coresets for the distributed processing.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"131 1","pages":"7:1-7:21"},"PeriodicalIF":0.0,"publicationDate":"2022-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86322004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Certifiable Robustness for Nearest Neighbor Classifiers 最近邻分类器的可认证鲁棒性

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory Pub Date : 2022-01-13 DOI: 10.4230/LIPIcs.ICDT.2022.6

Austen Z. Fan, Paraschos Koutris

{"title":"Certifiable Robustness for Nearest Neighbor Classifiers","authors":"Austen Z. Fan, Paraschos Koutris","doi":"10.4230/LIPIcs.ICDT.2022.6","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2022.6","url":null,"abstract":"ML models are typically trained using large datasets of high quality. However, training datasets often contain inconsistent or incomplete data. To tackle this issue, one solution is to develop algorithms that can check whether a prediction of a model is certifiably robust. Given a learning algorithm that produces a classifier and given an example at test time, a classification outcome is certifiably robust if it is predicted by every model trained across all possible worlds (repairs) of the uncertain (inconsistent) dataset. This notion of robustness falls naturally under the framework of certain answers. In this paper, we study the complexity of certifying robustness for a simple but widely deployed classification algorithm, $k$-Nearest Neighbors ($k$-NN). Our main focus is on inconsistent datasets when the integrity constraints are functional dependencies (FDs). For this setting, we establish a dichotomy in the complexity of certifying robustness w.r.t. the set of FDs: the problem either admits a polynomial time algorithm, or it is coNP-hard. Additionally, we exhibit a similar dichotomy for the counting version of the problem, where the goal is to count the number of possible worlds that predict a certain label. As a byproduct of our study, we also establish the complexity of a problem related to finding an optimal subset repair that may be of independent interest.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"25 1","pages":"6:1-6:20"},"PeriodicalIF":0.0,"publicationDate":"2022-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91270733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Characterising Fixed Parameter Tractability for Query Evaluation over Guarded TGDs 保护TGDs查询评估的固定参数可跟踪性特征

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory Pub Date : 2022-01-01 DOI: 10.4230/LIPIcs.ICDT.2022.12

C. Feier

{"title":"Characterising Fixed Parameter Tractability for Query Evaluation over Guarded TGDs","authors":"C. Feier","doi":"10.4230/LIPIcs.ICDT.2022.12","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2022.12","url":null,"abstract":"We consider the parameterized complexity of evaluating Ontology Mediated Queries (OMQ) based on Guarded TGDs (GTGD) and Unions of Conjunctive Queries, in the case where relational symbols have unrestricted arity and where the parameter is the size of the OMQ. We establish exact criteria for fixed-parameter tractable (fpt) evaluation of recursively enumerable (r.e.) classes of such OMQs (under the widely held Exponential Time Hypothesis). One of the main technical tools introduced in the paper is an fpt-reduction from deciding parameterized uniform CSPs to parameterized OMQ evaluation. The reduction preserves measures known to be essential for classifying r.e. classes of parameterized uniform CSPs: submodular width (according to the well known result of Marx for unrestricted-arity schemas) and treewidth (according to the well known result of Grohe for bounded-arity schemas). As such, it can be employed to obtain hardness results for evaluation of r.e. classes of parameterized OMQs based on GTGD both in the unrestricted and in the bounded arity case. Previously, for bounded arity schemas, this has been tackled using a technique requiring full introspection into the construction employed by Grohe. 2012 ACM Subject Classification Theory of computation → Database theory","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"10 1","pages":"12:1-12:20"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78343398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1

Inference of Shape Graphs for Graph Databases 图形数据库中形状图的推理

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory Pub Date : 2022-01-01 DOI: 10.4230/LIPIcs.ICDT.2022.14

B. Groz, Aurélien Lemay, S. Staworko, Piotr Wieczorek

引用次数: 6

Linear Programs with Conjunctive Queries 具有合取查询的线性规划

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory Pub Date : 2022-01-01 DOI: 10.4230/LIPIcs.ICDT.2022.5

Florent Capelli, Nicolas Crosetti, Joachim Niehren, J. Ramon

引用次数: 2

On the Hardness of Category Tree Construction 论类别树构造的硬度

Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory Pub Date : 2022-01-01 DOI: 10.4230/LIPIcs.ICDT.2022.4

Shay Gershtein, Uri Avron, Ido Guy, T. Milo, Slava Novgorodov

{"title":"On the Hardness of Category Tree Construction","authors":"Shay Gershtein, Uri Avron, Ido Guy, T. Milo, Slava Novgorodov","doi":"10.4230/LIPIcs.ICDT.2022.4","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2022.4","url":null,"abstract":"Category trees, or taxonomies, are rooted trees where each node, called a category, corresponds to a set of related items. The construction of taxonomies has been studied in various domains, including e-commerce, document management, and question answering. Multiple algorithms for automating construction have been proposed, employing a variety of clustering approaches and crowdsourcing. However, no formal model to capture such categorization problems has been devised, and their complexity has not been studied. To address this, we propose in this work a combinatorial model that captures many practical settings and show that the aforementioned empirical approach has been warranted, as we prove strong inapproximability bounds for various problem variants and special cases when the goal is to produce a categorization of the maximum utility. In our model, the input is a set of n weighted item sets that the tree would ideally contain as categories. Each category, rather than perfectly match the corresponding input set, is allowed to exceed a given threshold for a given similarity function. The goal is to produce a tree that maximizes the total weight of the sets for which it contains a matching category. A key parameter is an upper bound on the number of categories an item may belong to, which produces the hardness of the problem, as initially each item may be contained in an arbitrary number of input sets. For this model, we prove inapproximability bounds, of order ˜Θ( √ n ) or ˜Θ( n ), for various problem variants and special cases, loosely justifying the aforementioned heuristic approach. Our work includes reductions based on parameterized randomized constructions that highlight how various problem parameters and properties of the input may affect the hardness. Moreover, for the special case where the category must be identical to the corresponding input set, we devise an algorithm whose approximation guarantee depends solely on a more granular parameter, allowing improved worst-case guarantees. Finally, we also generalize our results to DAG-based and non-hierarchical categorization.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":"26 1","pages":"4:1-4:17"},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72814429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 1