利用聚合约束清理不确定的数据库

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 2010-03-01 DOI:10.1109/ICDEW.2010.5452733

Haiquan Chen, Wei-Shinn Ku, Haixun Wang

{"title":"利用聚合约束清理不确定的数据库","authors":"Haiquan Chen, Wei-Shinn Ku, Haixun Wang","doi":"10.1109/ICDEW.2010.5452733","DOIUrl":null,"url":null,"abstract":"Emerging uncertain database applications often involve the cleansing (conditioning) of uncertain databases using additional information as new evidence for reducing the uncertainty. However, past researches on conditioning probabilistic databases, unfortunately, only focus on functional dependency. In real world applications, most additional information on uncertain data sets can be acquired in the form of aggregate constraints (e.g., the aggregate results are published online for various statistical purposes). Therefore, if these aggregate constraints can be taken into account, uncertainty in data sets can be largely reduced. However, finding a practical method to exploit aggregate constraints to decrease uncertainty is a very challenging problem. In this paper, we present three approaches to cleanse (condition) uncertain databases by employing aggregate constraints. Because the problem is NP-hard, we focus on the two approximation strategies by modeling the problem as a nonlinear optimization problem and then utilizing Simulated Annealing (SA) and Evolutionary Algorithm (EA) to sample from the entire solution space of possible worlds. In order to favor those possible worlds holding higher probabilities and satisfying all the constraints at the same time, we define Satisfaction Degree Functions (SDF) and then construct the objective function accordingly. Subsequently, based on the sample result, we remove duplicates, re-normalize the probabilities of all the qualified possible worlds, and derive the posterior probabilistic database. Our experiments verify the efficiency and effectiveness of our algorithms and show that our approximate approaches scale well to large-sized databases.","PeriodicalId":442345,"journal":{"name":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Cleansing uncertain databases leveraging aggregate constraints\",\"authors\":\"Haiquan Chen, Wei-Shinn Ku, Haixun Wang\",\"doi\":\"10.1109/ICDEW.2010.5452733\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Emerging uncertain database applications often involve the cleansing (conditioning) of uncertain databases using additional information as new evidence for reducing the uncertainty. However, past researches on conditioning probabilistic databases, unfortunately, only focus on functional dependency. In real world applications, most additional information on uncertain data sets can be acquired in the form of aggregate constraints (e.g., the aggregate results are published online for various statistical purposes). Therefore, if these aggregate constraints can be taken into account, uncertainty in data sets can be largely reduced. However, finding a practical method to exploit aggregate constraints to decrease uncertainty is a very challenging problem. In this paper, we present three approaches to cleanse (condition) uncertain databases by employing aggregate constraints. Because the problem is NP-hard, we focus on the two approximation strategies by modeling the problem as a nonlinear optimization problem and then utilizing Simulated Annealing (SA) and Evolutionary Algorithm (EA) to sample from the entire solution space of possible worlds. In order to favor those possible worlds holding higher probabilities and satisfying all the constraints at the same time, we define Satisfaction Degree Functions (SDF) and then construct the objective function accordingly. Subsequently, based on the sample result, we remove duplicates, re-normalize the probabilities of all the qualified possible worlds, and derive the posterior probabilistic database. Our experiments verify the efficiency and effectiveness of our algorithms and show that our approximate approaches scale well to large-sized databases.\",\"PeriodicalId\":442345,\"journal\":{\"name\":\"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDEW.2010.5452733\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDEW.2010.5452733","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

摘要

新兴的不确定数据库应用通常涉及使用附加信息作为减少不确定性的新证据来清理(调整)不确定数据库。遗憾的是，以往对条件反射概率数据库的研究主要集中在函数依赖关系上。在现实世界的应用中，大多数不确定数据集的附加信息可以以聚合约束的形式获得(例如，出于各种统计目的，将聚合结果在线发布)。因此，如果考虑到这些总体约束，就可以大大降低数据集的不确定性。然而，寻找一种实用的方法来利用集合约束来减少不确定性是一个非常具有挑战性的问题。在本文中，我们提出了三种利用聚合约束来清理(条件)不确定数据库的方法。由于该问题是NP-hard问题，我们将该问题建模为一个非线性优化问题，然后利用模拟退火(SA)和进化算法(EA)从整个可能世界的解空间中采样，重点研究两种逼近策略。为了偏爱那些具有较高概率且同时满足所有约束条件的可能世界，我们定义了满意度函数(Satisfaction Degree Functions, SDF)，并据此构造目标函数。随后，基于样本结果，我们去除重复项，对所有符合条件的可能世界的概率进行重新归一化，并导出后验概率数据库。我们的实验验证了我们算法的效率和有效性，并表明我们的近似方法可以很好地扩展到大型数据库。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cleansing uncertain databases leveraging aggregate constraints

Emerging uncertain database applications often involve the cleansing (conditioning) of uncertain databases using additional information as new evidence for reducing the uncertainty. However, past researches on conditioning probabilistic databases, unfortunately, only focus on functional dependency. In real world applications, most additional information on uncertain data sets can be acquired in the form of aggregate constraints (e.g., the aggregate results are published online for various statistical purposes). Therefore, if these aggregate constraints can be taken into account, uncertainty in data sets can be largely reduced. However, finding a practical method to exploit aggregate constraints to decrease uncertainty is a very challenging problem. In this paper, we present three approaches to cleanse (condition) uncertain databases by employing aggregate constraints. Because the problem is NP-hard, we focus on the two approximation strategies by modeling the problem as a nonlinear optimization problem and then utilizing Simulated Annealing (SA) and Evolutionary Algorithm (EA) to sample from the entire solution space of possible worlds. In order to favor those possible worlds holding higher probabilities and satisfying all the constraints at the same time, we define Satisfaction Degree Functions (SDF) and then construct the objective function accordingly. Subsequently, based on the sample result, we remove duplicates, re-normalize the probabilities of all the qualified possible worlds, and derive the posterior probabilistic database. Our experiments verify the efficiency and effectiveness of our algorithms and show that our approximate approaches scale well to large-sized databases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)

自引率

0.00%

发文量