An efficient approach for data-duplication detection based on RDBMS

2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE) Pub Date : 2011-05-11 DOI:10.1109/JCSSE.2011.5930142

Kiettisak Chanhom, J. Natwichai

引用次数: 1

Abstract

Data-duplication is one of the most important issues in the context of information system management. Instead of storing a single real-world object as an entity in an information system, the duplication, storing more than one entity representing a single object, can be occurred. This problem can decrease the quality of service of information systems. In this paper, we propose an efficient approach to detect the duplication based on the RDBMS foundation. Our approach is based on the assumption that the data to be processed have been stored in the RDBMS at the first place. Thus, the proposed approach does not require the data to be imported/exported from the storage. Also, such approach will benefit from the query optimizer of the RDBMS. The experiment results on the TPC-H dataset have been presented to validate such proposed work.

查看原文本刊更多论文

一种基于RDBMS的数据重复检测方法

数据重复是信息系统管理中的一个重要问题。代替在信息系统中存储单个现实世界对象作为实体，可以发生复制，即存储多个代表单个对象的实体。这个问题会降低信息系统的服务质量。在本文中，我们提出了一种基于RDBMS基础的有效的重复检测方法。我们的方法基于要处理的数据首先存储在RDBMS中的假设。因此，建议的方法不需要从存储中导入/导出数据。此外，这种方法还将受益于RDBMS的查询优化器。在TPC-H数据集上的实验结果验证了所提出的工作。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2011 Eighth International Joint Conference on Computer Science and Software Engineering (JCSSE)

自引率

0.00%

发文量