A Robust Self-Organizing Approach to Effectively Clustering Incomplete Data

2015 Seventh International Conference on Knowledge and Systems Engineering (KSE) Pub Date : 2015-10-01 DOI:10.1109/KSE.2015.11

Vo Thi Ngoc Chau

{"title":"A Robust Self-Organizing Approach to Effectively Clustering Incomplete Data","authors":"Vo Thi Ngoc Chau","doi":"10.1109/KSE.2015.11","DOIUrl":null,"url":null,"abstract":"In the real world, incomplete data are often encountered and located anywhere in a data set. Such incomplete data make a data clustering task more challenging. It's common practice to eliminate incomplete data from the input data set. If there are a large number of missing values, ignoring them may lead to the data insufficiency and ineffectiveness of the data clustering task. Hence, incomplete data clustering has been considered in many research works with many different approaches based on the well-known existing clustering algorithms such as k-means, fuzzy c-means, the self-organizing map (SOM), mean shift, etc. However, few of them have examined both effectiveness and robustness of the incomplete data clustering algorithms. Some of them are not practical due to a lot of parameters in hybrid approaches and/or cannot handle incomplete data which appear in any object at any dimension. In contrast, this paper aims at a SOM-based incomplete data clustering algorithm, iS nps, which is a robust and effective solution to clustering incomplete data in a simple but practical approach. Is nps can do clustering on incomplete data as well as estimate incomplete data using the nearest prototype strategy in an iterative manner. As compared to several different existing approaches, our proposed algorithm can produce the clusters of good quality and a better approximation of incomplete data via the experiments on benchmark data sets.","PeriodicalId":289817,"journal":{"name":"2015 Seventh International Conference on Knowledge and Systems Engineering (KSE)","volume":"32 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 Seventh International Conference on Knowledge and Systems Engineering (KSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/KSE.2015.11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

In the real world, incomplete data are often encountered and located anywhere in a data set. Such incomplete data make a data clustering task more challenging. It's common practice to eliminate incomplete data from the input data set. If there are a large number of missing values, ignoring them may lead to the data insufficiency and ineffectiveness of the data clustering task. Hence, incomplete data clustering has been considered in many research works with many different approaches based on the well-known existing clustering algorithms such as k-means, fuzzy c-means, the self-organizing map (SOM), mean shift, etc. However, few of them have examined both effectiveness and robustness of the incomplete data clustering algorithms. Some of them are not practical due to a lot of parameters in hybrid approaches and/or cannot handle incomplete data which appear in any object at any dimension. In contrast, this paper aims at a SOM-based incomplete data clustering algorithm, iS nps, which is a robust and effective solution to clustering incomplete data in a simple but practical approach. Is nps can do clustering on incomplete data as well as estimate incomplete data using the nearest prototype strategy in an iterative manner. As compared to several different existing approaches, our proposed algorithm can produce the clusters of good quality and a better approximation of incomplete data via the experiments on benchmark data sets.

查看原文本刊更多论文

一种有效聚类不完全数据的鲁棒自组织方法

在现实世界中，经常会遇到不完整的数据，并且位于数据集中的任何位置。这种不完整的数据使数据聚类任务更具挑战性。通常的做法是从输入数据集中消除不完整的数据。如果存在大量缺失值，忽略它们可能会导致数据不足，导致数据聚类任务无效。因此，在现有的k-means、模糊c-means、自组织映射(SOM)、mean shift等知名聚类算法的基础上，许多研究工作都考虑了不完全数据聚类，采用了许多不同的方法。然而，很少有人研究不完全数据聚类算法的有效性和鲁棒性。由于混合方法中有很多参数，其中一些方法不实用，并且/或者不能处理出现在任何维度的任何对象中的不完整数据。相比之下，本文的目标是基于som的不完整数据聚类算法iS nps，该算法是一种简单而实用的不完整数据聚类的鲁棒有效解决方案。其nps既可以对不完整数据进行聚类，也可以使用迭代的最接近原型策略对不完整数据进行估计。通过在基准数据集上的实验，与几种不同的现有方法相比，我们提出的算法可以产生高质量的聚类，并且可以更好地逼近不完整数据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 Seventh International Conference on Knowledge and Systems Engineering (KSE)

自引率

0.00%

发文量