Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory最新文献

筛选
英文 中文
Constant-delay enumeration for SLP-compressed documents slp压缩文档的恒定延迟枚举
Martin Muñoz, Cristian Riveros
{"title":"Constant-delay enumeration for SLP-compressed documents","authors":"Martin Muñoz, Cristian Riveros","doi":"10.48550/arXiv.2209.12301","DOIUrl":"https://doi.org/10.48550/arXiv.2209.12301","url":null,"abstract":"We study the problem of enumerating results from a query over a compressed document. The model we use for compression are straight-line programs (SLPs), which are defined by a context-free grammar that produces a single string. For our queries, we use a model called Annotated Automata, an extension of regular automata that allows annotations on letters. This model extends the notion of Regular Spanners as it allows arbitrarily long outputs. Our main result is an algorithm that evaluates such a query by enumerating all results with output-linear delay after a preprocessing phase which takes linear time on the size of the SLP, and cubic time over the size of the automaton. This is an improvement over Schmid and Schweikardt's result, which, with the same preprocessing time, enumerates with a delay that is logarithmic on the size of the uncompressed document. We achieve this through a persistent data structure named Enumerable Compact Sets with Shifts which guarantees output-linear delay under certain restrictions. These results imply constant-delay enumeration algorithms in the context of regular spanners. Further, we use an extension of annotated automata which utilizes succinctly encoded annotations to save an exponential factor from previous results that dealt with constant-delay enumeration over vset automata. Lastly, we extend our results in the same fashion Schmid and Schweikardt did to allow complex document editing while maintaining the constant delay guarantee.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"84960906","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Conjunctive Queries with Free Access Patterns Under Updates 更新下具有自由访问模式的连接查询
A. Kara, M. Nikolic, Dan Olteanu, Haozhe Zhang
{"title":"Conjunctive Queries with Free Access Patterns Under Updates","authors":"A. Kara, M. Nikolic, Dan Olteanu, Haozhe Zhang","doi":"10.4230/LIPIcs.ICDT.2023.17","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2023.17","url":null,"abstract":"We study the problem of answering conjunctive queries with free access patterns under updates. A free access pattern is a partition of the free variables of the query into input and output. The query returns tuples over the output variables given a tuple of values over the input variables. We introduce a fully dynamic evaluation approach for such queries. We also give a syntactic characterisation of those queries that admit constant time per single-tuple update and whose output tuples can be enumerated with constant delay given an input tuple. Finally, we chart the complexity trade-off between the preprocessing time, update time and enumeration delay for such queries. For a class of queries, our approach achieves optimal, albeit non-constant, update time and delay. Their optimality is predicated on the Online Matrix-Vector Multiplication conjecture. Our results recover prior work on the dynamic evaluation of conjunctive queries without access patterns.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-06-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76280925","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Absolute Expressiveness of Subgraph-Based Centrality Measures 基于子图的中心性测度的绝对表达性
Andreas Pieris, J. Salas
{"title":"Absolute Expressiveness of Subgraph-Based Centrality Measures","authors":"Andreas Pieris, J. Salas","doi":"10.4230/LIPIcs.ICDT.2023.9","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2023.9","url":null,"abstract":"In graph-based applications, a common task is to pinpoint the most important or ``central'' vertex in a (directed or undirected) graph, or rank the vertices of a graph according to their importance. To this end, a plethora of so-called centrality measures have been proposed in the literature. Such measures assess which vertices in a graph are the most important ones by analyzing the structure of the underlying graph. A family of centrality measures that are suited for graph databases has been recently proposed by relying on the following simple principle: the importance of a vertex in a graph is relative to the number of ``relevant'' connected subgraphs surrounding it; we refer to the members of this family as subgraph-based centrality measures. Although it has been shown that such measures enjoy several favourable properties, their absolute expressiveness remains largely unexplored. The goal of this work is to precisely characterize the absolute expressiveness of the family of subgraph-based centrality measures by considering both directed and undirected graphs. To this end, we characterize when an arbitrary centrality measure is a subgraph-based one, or a subgraph-based measure relative to the induced ranking. These characterizations provide us with technical tools that allow us to determine whether well-established centrality measures are subgraph-based. Such a classification, apart from being interesting in its own right, gives useful insights on the structural similarities and differences among existing centrality measures.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76970000","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Probabilistic Query Evaluation with Bag Semantics 基于袋语义的概率查询求值
Martin Grohe, P. Lindner, Christoph Standke
{"title":"Probabilistic Query Evaluation with Bag Semantics","authors":"Martin Grohe, P. Lindner, Christoph Standke","doi":"10.4230/LIPIcs.ICDT.2023.20","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2023.20","url":null,"abstract":"We study the complexity of evaluating queries on probabilistic databases under bag semantics. We focus on self-join free conjunctive queries, and probabilistic databases where occurrences of different facts are independent, which is the natural generalization of tuple-independent probabilistic databases to the bag semantics setting. For set semantics, the data complexity of this problem is well understood, even for the more general class of unions of conjunctive queries: it is either in polynomial time, or #P-hard, depending on the query (Dalvi&Suciu, JACM 2012). A reasonably general model of bag probabilistic databases may have unbounded multiplicities. In this case, the probabilistic database is no longer finite, and a careful treatment of representation mechanisms is required. Moreover, the answer to a Boolean query is a probability distribution over (possibly all) non-negative integers, rather than a probability distribution over { true, false }. Therefore, we discuss two flavors of probabilistic query evaluation: computing expectations of answer tuple multiplicities, and computing the probability that a tuple is contained in the answer at most k times for some parameter k. Subject to mild technical assumptions on the representation systems, it turns out that expectations are easy to compute, even for unions of conjunctive queries. For query answer probabilities, we obtain a dichotomy between solvability in polynomial time and #P-hardness for self-join free conjunctive queries.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73106414","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Improved Approximation and Scalability for Fair Max-Min Diversification 公平最大最小分散的改进逼近和可扩展性
Raghavendra Addanki, A. Mcgregor, A. Meliou, Zafeiria Moumoulidou
{"title":"Improved Approximation and Scalability for Fair Max-Min Diversification","authors":"Raghavendra Addanki, A. Mcgregor, A. Meliou, Zafeiria Moumoulidou","doi":"10.4230/LIPIcs.ICDT.2022.7","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2022.7","url":null,"abstract":"Given an $n$-point metric space $(mathcal{X},d)$ where each point belongs to one of $m=O(1)$ different categories or groups and a set of integers $k_1, ldots, k_m$, the fair Max-Min diversification problem is to select $k_i$ points belonging to category $iin [m]$, such that the minimum pairwise distance between selected points is maximized. The problem was introduced by Moumoulidou et al. [ICDT 2021] and is motivated by the need to down-sample large data sets in various applications so that the derived sample achieves a balance over diversity, i.e., the minimum distance between a pair of selected points, and fairness, i.e., ensuring enough points of each category are included. We prove the following results: 1. We first consider general metric spaces. We present a randomized polynomial time algorithm that returns a factor $2$-approximation to the diversity but only satisfies the fairness constraints in expectation. Building upon this result, we present a $6$-approximation that is guaranteed to satisfy the fairness constraints up to a factor $1-epsilon$ for any constant $epsilon$. We also present a linear time algorithm returning an $m+1$ approximation with exact fairness. The best previous result was a $3m-1$ approximation. 2. We then focus on Euclidean metrics. We first show that the problem can be solved exactly in one dimension. For constant dimensions, categories and any constant $epsilon>0$, we present a $1+epsilon$ approximation algorithm that runs in $O(nk) + 2^{O(k)}$ time where $k=k_1+ldots+k_m$. We can improve the running time to $O(nk)+ poly(k)$ at the expense of only picking $(1-epsilon) k_i$ points from category $iin [m]$. Finally, we present algorithms suitable to processing massive data sets including single-pass data stream algorithms and composable coresets for the distributed processing.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-01-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86322004","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 5
Certifiable Robustness for Nearest Neighbor Classifiers 最近邻分类器的可认证鲁棒性
Austen Z. Fan, Paraschos Koutris
{"title":"Certifiable Robustness for Nearest Neighbor Classifiers","authors":"Austen Z. Fan, Paraschos Koutris","doi":"10.4230/LIPIcs.ICDT.2022.6","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2022.6","url":null,"abstract":"ML models are typically trained using large datasets of high quality. However, training datasets often contain inconsistent or incomplete data. To tackle this issue, one solution is to develop algorithms that can check whether a prediction of a model is certifiably robust. Given a learning algorithm that produces a classifier and given an example at test time, a classification outcome is certifiably robust if it is predicted by every model trained across all possible worlds (repairs) of the uncertain (inconsistent) dataset. This notion of robustness falls naturally under the framework of certain answers. In this paper, we study the complexity of certifying robustness for a simple but widely deployed classification algorithm, $k$-Nearest Neighbors ($k$-NN). Our main focus is on inconsistent datasets when the integrity constraints are functional dependencies (FDs). For this setting, we establish a dichotomy in the complexity of certifying robustness w.r.t. the set of FDs: the problem either admits a polynomial time algorithm, or it is coNP-hard. Additionally, we exhibit a similar dichotomy for the counting version of the problem, where the goal is to count the number of possible worlds that predict a certain label. As a byproduct of our study, we also establish the complexity of a problem related to finding an optimal subset repair that may be of independent interest.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"91270733","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Characterising Fixed Parameter Tractability for Query Evaluation over Guarded TGDs 保护TGDs查询评估的固定参数可跟踪性特征
C. Feier
{"title":"Characterising Fixed Parameter Tractability for Query Evaluation over Guarded TGDs","authors":"C. Feier","doi":"10.4230/LIPIcs.ICDT.2022.12","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2022.12","url":null,"abstract":"We consider the parameterized complexity of evaluating Ontology Mediated Queries (OMQ) based on Guarded TGDs (GTGD) and Unions of Conjunctive Queries, in the case where relational symbols have unrestricted arity and where the parameter is the size of the OMQ. We establish exact criteria for fixed-parameter tractable (fpt) evaluation of recursively enumerable (r.e.) classes of such OMQs (under the widely held Exponential Time Hypothesis). One of the main technical tools introduced in the paper is an fpt-reduction from deciding parameterized uniform CSPs to parameterized OMQ evaluation. The reduction preserves measures known to be essential for classifying r.e. classes of parameterized uniform CSPs: submodular width (according to the well known result of Marx for unrestricted-arity schemas) and treewidth (according to the well known result of Grohe for bounded-arity schemas). As such, it can be employed to obtain hardness results for evaluation of r.e. classes of parameterized OMQs based on GTGD both in the unrestricted and in the bounded arity case. Previously, for bounded arity schemas, this has been tackled using a technique requiring full introspection into the construction employed by Grohe. 2012 ACM Subject Classification Theory of computation → Database theory","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"78343398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Inference of Shape Graphs for Graph Databases 图形数据库中形状图的推理
B. Groz, Aurélien Lemay, S. Staworko, Piotr Wieczorek
{"title":"Inference of Shape Graphs for Graph Databases","authors":"B. Groz, Aurélien Lemay, S. Staworko, Piotr Wieczorek","doi":"10.4230/LIPIcs.ICDT.2022.14","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2022.14","url":null,"abstract":"We investigate the problem of constructing a shape graph that describes the structure of a given graph database. We employ the framework of grammatical inference , where the objective is to find an inference algorithm that is both sound , i.e., always producing a schema that validates the input graph, and complete , i.e., able to produce any schema, within a given class of schemas, provided that a sufficiently informative input graph is presented. We identify a number of fundamental limitations that preclude feasible inference. We present inference algorithms based on natural approaches that allow to infer schemas that we argue to be of practical importance.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88980408","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Linear Programs with Conjunctive Queries 具有合取查询的线性规划
Florent Capelli, Nicolas Crosetti, Joachim Niehren, J. Ramon
{"title":"Linear Programs with Conjunctive Queries","authors":"Florent Capelli, Nicolas Crosetti, Joachim Niehren, J. Ramon","doi":"10.4230/LIPIcs.ICDT.2022.5","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2022.5","url":null,"abstract":"In this paper, we study the problem of optimizing a linear program whose variables are the answers to a conjunctive query. For this we propose the language LP(CQ) for specifying linear programs whose constraints and objective functions depend on the answer sets of conjunctive queries. We contribute an efficient algorithm for solving programs in a fragment of LP(CQ). The naive approach constructs a linear program having as many variables as there are elements in the answer set of the queries. Our approach constructs a linear program having the same optimal value but fewer variables. This is done by exploiting the structure of the conjunctive queries using generalized hypertree decompositions of small width to factorize elements of the answer set together. We illustrate the various applications of LP(CQ) programs on three examples: optimizing deliveries of resources, minimizing noise for differential privacy, and computing the s-measure of patterns in graphs as needed for data mining. We","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"88348284","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
On the Hardness of Category Tree Construction 论类别树构造的硬度
Shay Gershtein, Uri Avron, Ido Guy, T. Milo, Slava Novgorodov
{"title":"On the Hardness of Category Tree Construction","authors":"Shay Gershtein, Uri Avron, Ido Guy, T. Milo, Slava Novgorodov","doi":"10.4230/LIPIcs.ICDT.2022.4","DOIUrl":"https://doi.org/10.4230/LIPIcs.ICDT.2022.4","url":null,"abstract":"Category trees, or taxonomies, are rooted trees where each node, called a category, corresponds to a set of related items. The construction of taxonomies has been studied in various domains, including e-commerce, document management, and question answering. Multiple algorithms for automating construction have been proposed, employing a variety of clustering approaches and crowdsourcing. However, no formal model to capture such categorization problems has been devised, and their complexity has not been studied. To address this, we propose in this work a combinatorial model that captures many practical settings and show that the aforementioned empirical approach has been warranted, as we prove strong inapproximability bounds for various problem variants and special cases when the goal is to produce a categorization of the maximum utility. In our model, the input is a set of n weighted item sets that the tree would ideally contain as categories. Each category, rather than perfectly match the corresponding input set, is allowed to exceed a given threshold for a given similarity function. The goal is to produce a tree that maximizes the total weight of the sets for which it contains a matching category. A key parameter is an upper bound on the number of categories an item may belong to, which produces the hardness of the problem, as initially each item may be contained in an arbitrary number of input sets. For this model, we prove inapproximability bounds, of order ˜Θ( √ n ) or ˜Θ( n ), for various problem variants and special cases, loosely justifying the aforementioned heuristic approach. Our work includes reductions based on parameterized randomized constructions that highlight how various problem parameters and properties of the input may affect the hardness. Moreover, for the special case where the category must be identical to the corresponding input set, we devise an algorithm whose approximation guarantee depends solely on a more granular parameter, allowing improved worst-case guarantees. Finally, we also generalize our results to DAG-based and non-hierarchical categorization.","PeriodicalId":90482,"journal":{"name":"Database theory-- ICDT : International Conference ... proceedings. International Conference on Database Theory","volume":null,"pages":null},"PeriodicalIF":0.0,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"72814429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信