Deriving probabilistic databases with inference ensembles

2011 IEEE 27th International Conference on Data Engineering Pub Date : 2011-04-11 DOI:10.1109/ICDE.2011.5767854

Julia Stoyanovich, S. Davidson, T. Milo, V. Tannen

引用次数: 18

Abstract

Many real-world applications deal with uncertain or missing data, prompting a surge of activity in the area of probabilistic databases. A shortcoming of prior work is the assumption that an appropriate probabilistic model, along with the necessary probability distributions, is given. We address this shortcoming by presenting a framework for learning a set of inference ensembles, termed meta-rule semi-lattices, or MRSL, from the complete portion of the data. We use the MRSL to infer probability distributions for missing data, and demonstrate experimentally that high accuracy is achieved when a single attribute value is missing per tuple. We next propose an inference algorithm based on Gibbs sampling that accurately predicts the probability distribution for multiple missing values. We also develop an optimization that greatly improves performance of multi-attribute inference for collections of tuples, while maintaining high accuracy. Finally, we develop an experimental framework to evaluate the efficiency and accuracy of our approach.

查看原文本刊更多论文

推导具有推理集成的概率数据库

许多现实世界的应用程序处理不确定或丢失的数据，促使概率数据库领域的活动激增。先前工作的一个缺点是假设一个适当的概率模型，以及必要的概率分布，是给定的。我们通过提出一个框架来解决这个缺点，该框架用于从数据的完整部分学习一组推理集成，称为元规则半格，或MRSL。我们使用MRSL来推断缺失数据的概率分布，并通过实验证明，当每个元组缺少单个属性值时，可以实现高精度。接下来，我们提出了一种基于Gibbs抽样的推理算法，可以准确地预测多个缺失值的概率分布。我们还开发了一个优化，极大地提高了元组集合的多属性推理性能，同时保持了较高的准确性。最后，我们开发了一个实验框架来评估我们的方法的效率和准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 IEEE 27th International Conference on Data Engineering

自引率

0.00%

发文量