Clustering Relational Database Entities Using K-means

2010 Second International Conference on Advances in Databases, Knowledge, and Data Applications Pub Date : 2010-04-11 DOI:10.1109/DBKDA.2010.32

F. Bourennani, M. Guennoun, Ying Zhu

{"title":"Clustering Relational Database Entities Using K-means","authors":"F. Bourennani, M. Guennoun, Ying Zhu","doi":"10.1109/DBKDA.2010.32","DOIUrl":null,"url":null,"abstract":"The fast evolution of hardware and the internet made large volumes of data more accessible. This data is composed of heterogeneous data types such as text, numbers, multimedia, and others. Non-overlapping research communities work on processing homogeneous data types. Nevertheless, from the user perspective, these heterogeneous data types should behave and be accessed in a similar fashion. Processing heterogeneous data types, which is Heterogeneous Data Mining (HDM), is a complex task. However, the HDM by Unified Vectorization (HDM-UV) seems to be an appropriate solution for this problem because it permits to process the heterogeneous data types simultaneously. In this paper, we use K-means and Self-Organizing Maps for simultaneously processing textual and numerical data types by UV. We evaluate how the HDM-UV improves the clustering results of these two algorithms (SOM, K-means) by comparing them to the traditional homogeneous data processing. Furthermore, we compare the clustering results of the two algorithms applied to a data integration problem.","PeriodicalId":273177,"journal":{"name":"2010 Second International Conference on Advances in Databases, Knowledge, and Data Applications","volume":"47 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Second International Conference on Advances in Databases, Knowledge, and Data Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/DBKDA.2010.32","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

Abstract

The fast evolution of hardware and the internet made large volumes of data more accessible. This data is composed of heterogeneous data types such as text, numbers, multimedia, and others. Non-overlapping research communities work on processing homogeneous data types. Nevertheless, from the user perspective, these heterogeneous data types should behave and be accessed in a similar fashion. Processing heterogeneous data types, which is Heterogeneous Data Mining (HDM), is a complex task. However, the HDM by Unified Vectorization (HDM-UV) seems to be an appropriate solution for this problem because it permits to process the heterogeneous data types simultaneously. In this paper, we use K-means and Self-Organizing Maps for simultaneously processing textual and numerical data types by UV. We evaluate how the HDM-UV improves the clustering results of these two algorithms (SOM, K-means) by comparing them to the traditional homogeneous data processing. Furthermore, we compare the clustering results of the two algorithms applied to a data integration problem.

查看原文本刊更多论文

使用K-means聚类关系数据库实体

硬件和互联网的快速发展使大量数据更容易获取。该数据由异构数据类型(如文本、数字、多媒体等)组成。非重叠的研究团体致力于处理同构数据类型。然而，从用户的角度来看，这些异构数据类型应该以类似的方式表现和访问。处理异构数据类型是一项复杂的任务，即异构数据挖掘。然而，统一向量化(HDM- uv)的HDM似乎是这个问题的合适解决方案，因为它允许同时处理异构数据类型。在本文中，我们使用K-means和自组织映射来同时处理文本和数字数据类型。通过与传统的同构数据处理进行比较，我们评估了HDM-UV如何改善这两种算法(SOM, K-means)的聚类结果。此外，我们比较了应用于数据集成问题的两种算法的聚类结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2010 Second International Conference on Advances in Databases, Knowledge, and Data Applications

自引率

0.00%

发文量