Clustering Algorithms for Incomplete Datasets

Loai Abdallah, I. Shimshoni
{"title":"Clustering Algorithms for Incomplete Datasets","authors":"Loai Abdallah, I. Shimshoni","doi":"10.5772/INTECHOPEN.78272","DOIUrl":null,"url":null,"abstract":"Many real-world dataset suffers from the problem of missing values. Several methods were developed to deal with this problem. Many of them filled the missing values within fixed value based on statistical computation. In this research, we developed a new ver- sions of the k-means and the mean shift clustering algorithms that deal with datasets with missing values without filling their values. We developed a new distance function that is able to compute distances over incomplete datasets. The distance was computed based only on the mean and variance of the data for each attribute. As a result, the runtime complexity of our computation was O 1 ð Þ . We experimented on six standard numerical datasets from different fields. On these datasets, we simulated missing values and com- pared the performance of the developed algorithms using our distance and the suggested mean computations to other three basic methods. Our experiments show that the devel- oped algorithms using our distance function outperform the existing k-means and mean shift using other methods for dealing with missing values.","PeriodicalId":236959,"journal":{"name":"Recent Applications in Data Clustering","volume":"29 5","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Recent Applications in Data Clustering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5772/INTECHOPEN.78272","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Many real-world dataset suffers from the problem of missing values. Several methods were developed to deal with this problem. Many of them filled the missing values within fixed value based on statistical computation. In this research, we developed a new ver- sions of the k-means and the mean shift clustering algorithms that deal with datasets with missing values without filling their values. We developed a new distance function that is able to compute distances over incomplete datasets. The distance was computed based only on the mean and variance of the data for each attribute. As a result, the runtime complexity of our computation was O 1 ð Þ . We experimented on six standard numerical datasets from different fields. On these datasets, we simulated missing values and com- pared the performance of the developed algorithms using our distance and the suggested mean computations to other three basic methods. Our experiments show that the devel- oped algorithms using our distance function outperform the existing k-means and mean shift using other methods for dealing with missing values.
不完整数据集的聚类算法
许多现实世界的数据集都存在缺失值的问题。研究了几种方法来处理这个问题。很多都是通过统计计算在固定值内填补缺失值。在这项研究中,我们开发了一种新的k-means和mean shift聚类算法,该算法处理缺失值的数据集而不填充它们的值。我们开发了一个新的距离函数,可以在不完整的数据集上计算距离。距离仅根据每个属性的数据的均值和方差计算。因此,我们计算的运行时复杂度为O 1 ð Þ。我们对来自不同领域的六个标准数值数据集进行了实验。在这些数据集上,我们模拟了缺失值,并使用我们的距离和建议的平均值计算将开发的算法的性能与其他三种基本方法进行了比较。我们的实验表明,使用我们的距离函数开发的算法优于使用其他方法处理缺失值的现有k-means和mean shift。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信