Duplicate detection in probabilistic data

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010) Pub Date : 2009-12-01 DOI:10.1109/ICDEW.2010.5452759

Fabian Panse, M. V. Keulen, A. D. Keijzer, N. Ritter

引用次数: 25

Abstract

Collected data often contains uncertainties. Probabilistic databases have been proposed to manage uncertain data. To combine data from multiple autonomous probabilistic databases, an integration of probabilistic data has to be performed. Until now, however, data integration approaches have focused on the integration of certain source data (relational or XML). There is no work on the integration of uncertain source data so far. In this paper, we present a first step towards a concise consolidation of probabilistic data. We focus on duplicate detection as a representative and essential step in an integration process. We present techniques for identifying multiple probabilistic representations of the same real-world entities.

查看原文本刊更多论文

概率数据中的重复检测

收集到的数据往往包含不确定性。概率数据库已被提出用于管理不确定数据。为了组合来自多个自治概率数据库的数据，必须执行概率数据的集成。但是，到目前为止，数据集成方法主要集中于集成某些源数据(关系数据或XML数据)。目前还没有关于不确定源数据集成的研究。在本文中，我们提出了一个简洁的整合概率数据的第一步。我们将重复检测作为集成过程中具有代表性和必要的步骤。我们提出了识别相同现实世界实体的多个概率表示的技术。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010)

自引率

0.00%

发文量