Aggregate Query Answering on Possibilistic Data with Cardinality Constraints

2012 IEEE 28th International Conference on Data Engineering Pub Date : 2012-04-01 DOI:10.1109/ICDE.2012.15

Graham Cormode, D. Srivastava, E. Shen, Ting Yu

{"title":"Aggregate Query Answering on Possibilistic Data with Cardinality Constraints","authors":"Graham Cormode, D. Srivastava, E. Shen, Ting Yu","doi":"10.1109/ICDE.2012.15","DOIUrl":null,"url":null,"abstract":"Uncertainties in data can arise for a number of reasons: when data is incomplete, contains conflicting information or has been deliberately perturbed or coarsened to remove sensitive details. An important case which arises in many real applications is when the data describes a set of possibilities, but with cardinality constraints. These constraints represent correlations between tuples encoding, e.g. that at most two possible records are correct, or that there is an (unknown) one-to-one mapping between a set of tuples and attribute values. Although there has been much effort to handle uncertain data, current systems are not equipped to handle such correlations, beyond simple mutual exclusion and co-existence constraints. Vitally, they have little support for efficiently handling aggregate queries on such data. In this paper, we aim to address some of these deficiencies, by introducing LICM (Linear Integer Constraint Model), which can succinctly represent many types of tuple correlations, particularly a class of cardinality constraints. We motivate and explain the model with examples from data cleaning and masking sensitive data, to show that it enables modeling and querying such data, which was not previously possible. We develop an efficient strategy to answer conjunctive and aggregate queries on possibilistic data by describing how to implement relational operators over data in the model. LICM compactly integrates the encoding of correlations, query answering and lineage recording. In combination with off-the-shelf linear integer programming solvers, our approach provides exact bounds for aggregate queries. Our prototype implementation demonstrates that query answering with LICM can be effective and scalable.","PeriodicalId":321608,"journal":{"name":"2012 IEEE 28th International Conference on Data Engineering","volume":"34 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2012 IEEE 28th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2012.15","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 9

Abstract

Uncertainties in data can arise for a number of reasons: when data is incomplete, contains conflicting information or has been deliberately perturbed or coarsened to remove sensitive details. An important case which arises in many real applications is when the data describes a set of possibilities, but with cardinality constraints. These constraints represent correlations between tuples encoding, e.g. that at most two possible records are correct, or that there is an (unknown) one-to-one mapping between a set of tuples and attribute values. Although there has been much effort to handle uncertain data, current systems are not equipped to handle such correlations, beyond simple mutual exclusion and co-existence constraints. Vitally, they have little support for efficiently handling aggregate queries on such data. In this paper, we aim to address some of these deficiencies, by introducing LICM (Linear Integer Constraint Model), which can succinctly represent many types of tuple correlations, particularly a class of cardinality constraints. We motivate and explain the model with examples from data cleaning and masking sensitive data, to show that it enables modeling and querying such data, which was not previously possible. We develop an efficient strategy to answer conjunctive and aggregate queries on possibilistic data by describing how to implement relational operators over data in the model. LICM compactly integrates the encoding of correlations, query answering and lineage recording. In combination with off-the-shelf linear integer programming solvers, our approach provides exact bounds for aggregate queries. Our prototype implementation demonstrates that query answering with LICM can be effective and scalable.

查看原文本刊更多论文

具有基数约束的可能性数据的聚合查询应答

造成数据不确定性的原因有很多:当数据不完整，包含相互矛盾的信息，或故意干扰或粗化以删除敏感细节时。在许多实际应用程序中出现的一个重要情况是，数据描述了一组可能性，但具有基数约束。这些约束表示元组编码之间的相关性，例如，最多有两个可能的记录是正确的，或者在一组元组和属性值之间存在(未知的)一对一映射。尽管在处理不确定数据方面已经付出了很多努力，但目前的系统还没有能力处理这种相关性，除了简单的互斥和共存约束。实际上，它们很少支持有效地处理此类数据的聚合查询。在本文中，我们的目标是通过引入LICM(线性整数约束模型)来解决其中的一些缺陷，该模型可以简洁地表示许多类型的元组相关性，特别是一类基数约束。我们使用来自数据清理和屏蔽敏感数据的示例来激励和解释该模型，以表明它支持对此类数据进行建模和查询，这在以前是不可能的。通过描述如何在模型中的数据上实现关系运算符，我们开发了一种有效的策略来回答对可能性数据的连接和聚合查询。LICM紧凑地集成了关联编码、查询应答和沿袭记录。结合现成的线性整数规划求解器，我们的方法为聚合查询提供了精确的边界。我们的原型实现表明，使用LICM进行查询应答是有效的和可扩展的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2012 IEEE 28th International Conference on Data Engineering

自引率

0.00%

发文量