An Effective Clustering Mechanism for Uncertain Data Mining Using Centroid Boundary in UKmeans

2016 International Computer Symposium (ICS) Pub Date : 2016-12-01 DOI:10.1109/ICS.2016.0067

Kuan-Teng Liao, Chuan-Ming Liu

{"title":"An Effective Clustering Mechanism for Uncertain Data Mining Using Centroid Boundary in UKmeans","authors":"Kuan-Teng Liao, Chuan-Ming Liu","doi":"10.1109/ICS.2016.0067","DOIUrl":null,"url":null,"abstract":"Object errors affect the time cost and effectiveness in uncertain data clustering. For decreasing the time cost and increasing the effectiveness, we propose two mechanisms for the centroid based clustering, UKmeans. The first mechanism is an improved similarity. Similarity is an intuitive factor that immediately affects the time cost and effectiveness. For example, similarity calculations with integration focus on the effectiveness of clustering but ignore the time cost. On the contrary, the similarity calculations by simplified approaches address on the issue of time cost but ignore the effectiveness. In this study, for considering both the time cost and effectiveness, we use a simplified similarity for reducing the time cost, and add additional two factors, namely intersection and density of clusters, to increase the effectiveness of clustering. The former factor can increase the degree of the object belongingness when a cluster overlaps the object. The latter factor can avoid objects to be attracted by clusters which have large errors. The other proposed mechanism is the definition of the centroid boundary. In clustering, the position of a cluster centroid is in an average range which contributes from the belonging objects' errors. However, the large average range causes the low effectiveness of clustering. For decreasing the range, we propose the square root boundary mechanism to limit the upper bound of possible positions of centroids to increase the effectiveness of clustering. In experiments, the results suggest that our two mechanisms work well in the time cost and effectiveness and these two mechanisms complete the UKmeans approaches in uncertain data clustering.","PeriodicalId":281088,"journal":{"name":"2016 International Computer Symposium (ICS)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Computer Symposium (ICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICS.2016.0067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 4

Abstract

Object errors affect the time cost and effectiveness in uncertain data clustering. For decreasing the time cost and increasing the effectiveness, we propose two mechanisms for the centroid based clustering, UKmeans. The first mechanism is an improved similarity. Similarity is an intuitive factor that immediately affects the time cost and effectiveness. For example, similarity calculations with integration focus on the effectiveness of clustering but ignore the time cost. On the contrary, the similarity calculations by simplified approaches address on the issue of time cost but ignore the effectiveness. In this study, for considering both the time cost and effectiveness, we use a simplified similarity for reducing the time cost, and add additional two factors, namely intersection and density of clusters, to increase the effectiveness of clustering. The former factor can increase the degree of the object belongingness when a cluster overlaps the object. The latter factor can avoid objects to be attracted by clusters which have large errors. The other proposed mechanism is the definition of the centroid boundary. In clustering, the position of a cluster centroid is in an average range which contributes from the belonging objects' errors. However, the large average range causes the low effectiveness of clustering. For decreasing the range, we propose the square root boundary mechanism to limit the upper bound of possible positions of centroids to increase the effectiveness of clustering. In experiments, the results suggest that our two mechanisms work well in the time cost and effectiveness and these two mechanisms complete the UKmeans approaches in uncertain data clustering.

查看原文本刊更多论文

一种基于UKmeans质心边界的不确定数据挖掘聚类机制

对象误差影响不确定数据聚类的时间成本和有效性。为了降低聚类的时间成本和提高聚类的有效性，我们提出了两种基于质心的聚类机制:UKmeans。第一种机制是改进的相似性。相似性是直接影响时间成本和效率的直观因素。例如，集成的相似度计算关注聚类的有效性，而忽略了时间成本。相反，简化方法的相似度计算解决了时间成本问题，而忽略了有效性。在本研究中，为了同时考虑时间成本和有效性，我们使用简化的相似度来降低时间成本，并增加两个因素，即聚类的交集和密度，以提高聚类的有效性。当集群与对象重叠时，前一个因素可以增加对象的归属程度。后一种因素可以避免物体被误差较大的聚类所吸引。另一种提出的机制是质心边界的定义。在聚类过程中，聚类质心的位置在一个平均范围内，这与所属对象的误差有关。然而，平均范围太大导致聚类的有效性较低。为了减小范围，我们提出了平方根边界机制来限制质心可能位置的上界，以提高聚类的有效性。实验结果表明，我们的两种机制在时间成本和有效性上都很好，这两种机制完善了UKmeans方法在不确定数据聚类中的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 International Computer Symposium (ICS)

自引率

0.00%

发文量