Improving Similarity Join Algorithms Using Fuzzy Clustering Technique

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI:10.1109/ICDMW.2009.50

L. Tan, F. Fotouhi, W. Grosky, Horia F. Pop, N. Mouaddib

引用次数: 1

Abstract

In this paper, we propose a pre-processing technique to improve existing string similarity join algorithms using fuzzy clustering. Our approach first identifies groups of related attributes and then, using this information, we apply existing string similarity join algorithms on these attributes. To identify the clustered attributes we use fuzzy techniques. This approach can be applied to the integration of knowledge bases and databases, as well as handle inconsistent values and naming conventions, incorrect or missing data values, and incomplete information from multiple sources with semi-compatible attributes or homogenous attributes. Using an experimental study, we have shown our preprocessing approach improves existing string similarity join algorithms by about 10 percent on precision and recall.

查看原文本刊更多论文

利用模糊聚类技术改进相似连接算法

在本文中，我们提出了一种预处理技术来改进现有的使用模糊聚类的字符串相似连接算法。我们的方法首先识别相关属性组，然后使用这些信息对这些属性应用现有的字符串相似连接算法。为了识别聚类属性，我们使用模糊技术。这种方法可以应用于知识库和数据库的集成，也可以处理不一致的值和命名约定、不正确或缺失的数据值，以及来自具有半兼容属性或同质属性的多个源的不完整信息。通过实验研究，我们已经证明我们的预处理方法在精度和召回率上提高了现有字符串相似连接算法约10%。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 IEEE International Conference on Data Mining Workshops

自引率

0.00%

发文量