Clustering Algorithms for Incomplete Datasets

Recent Applications in Data Clustering Pub Date : 2018-08-01 DOI:10.5772/INTECHOPEN.78272

Loai Abdallah, I. Shimshoni

引用次数: 2

Abstract

Many real-world dataset suffers from the problem of missing values. Several methods were developed to deal with this problem. Many of them filled the missing values within fixed value based on statistical computation. In this research, we developed a new ver- sions of the k-means and the mean shift clustering algorithms that deal with datasets with missing values without filling their values. We developed a new distance function that is able to compute distances over incomplete datasets. The distance was computed based only on the mean and variance of the data for each attribute. As a result, the runtime complexity of our computation was O 1 ð Þ . We experimented on six standard numerical datasets from different fields. On these datasets, we simulated missing values and com- pared the performance of the developed algorithms using our distance and the suggested mean computations to other three basic methods. Our experiments show that the devel- oped algorithms using our distance function outperform the existing k-means and mean shift using other methods for dealing with missing values.

查看原文本刊更多论文

不完整数据集的聚类算法

许多现实世界的数据集都存在缺失值的问题。研究了几种方法来处理这个问题。很多都是通过统计计算在固定值内填补缺失值。在这项研究中，我们开发了一种新的k-means和mean shift聚类算法，该算法处理缺失值的数据集而不填充它们的值。我们开发了一个新的距离函数，可以在不完整的数据集上计算距离。距离仅根据每个属性的数据的均值和方差计算。因此，我们计算的运行时复杂度为O 1 ð Þ。我们对来自不同领域的六个标准数值数据集进行了实验。在这些数据集上，我们模拟了缺失值，并使用我们的距离和建议的平均值计算将开发的算法的性能与其他三种基本方法进行了比较。我们的实验表明，使用我们的距离函数开发的算法优于使用其他方法处理缺失值的现有k-means和mean shift。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Recent Applications in Data Clustering

自引率

0.00%

发文量