{"title":"An Effective Clustering Mechanism for Uncertain Data Mining Using Centroid Boundary in UKmeans","authors":"Kuan-Teng Liao, Chuan-Ming Liu","doi":"10.1109/ICS.2016.0067","DOIUrl":null,"url":null,"abstract":"Object errors affect the time cost and effectiveness in uncertain data clustering. For decreasing the time cost and increasing the effectiveness, we propose two mechanisms for the centroid based clustering, UKmeans. The first mechanism is an improved similarity. Similarity is an intuitive factor that immediately affects the time cost and effectiveness. For example, similarity calculations with integration focus on the effectiveness of clustering but ignore the time cost. On the contrary, the similarity calculations by simplified approaches address on the issue of time cost but ignore the effectiveness. In this study, for considering both the time cost and effectiveness, we use a simplified similarity for reducing the time cost, and add additional two factors, namely intersection and density of clusters, to increase the effectiveness of clustering. The former factor can increase the degree of the object belongingness when a cluster overlaps the object. The latter factor can avoid objects to be attracted by clusters which have large errors. The other proposed mechanism is the definition of the centroid boundary. In clustering, the position of a cluster centroid is in an average range which contributes from the belonging objects' errors. However, the large average range causes the low effectiveness of clustering. For decreasing the range, we propose the square root boundary mechanism to limit the upper bound of possible positions of centroids to increase the effectiveness of clustering. In experiments, the results suggest that our two mechanisms work well in the time cost and effectiveness and these two mechanisms complete the UKmeans approaches in uncertain data clustering.","PeriodicalId":281088,"journal":{"name":"2016 International Computer Symposium (ICS)","volume":"114 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 International Computer Symposium (ICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICS.2016.0067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4
Abstract
Object errors affect the time cost and effectiveness in uncertain data clustering. For decreasing the time cost and increasing the effectiveness, we propose two mechanisms for the centroid based clustering, UKmeans. The first mechanism is an improved similarity. Similarity is an intuitive factor that immediately affects the time cost and effectiveness. For example, similarity calculations with integration focus on the effectiveness of clustering but ignore the time cost. On the contrary, the similarity calculations by simplified approaches address on the issue of time cost but ignore the effectiveness. In this study, for considering both the time cost and effectiveness, we use a simplified similarity for reducing the time cost, and add additional two factors, namely intersection and density of clusters, to increase the effectiveness of clustering. The former factor can increase the degree of the object belongingness when a cluster overlaps the object. The latter factor can avoid objects to be attracted by clusters which have large errors. The other proposed mechanism is the definition of the centroid boundary. In clustering, the position of a cluster centroid is in an average range which contributes from the belonging objects' errors. However, the large average range causes the low effectiveness of clustering. For decreasing the range, we propose the square root boundary mechanism to limit the upper bound of possible positions of centroids to increase the effectiveness of clustering. In experiments, the results suggest that our two mechanisms work well in the time cost and effectiveness and these two mechanisms complete the UKmeans approaches in uncertain data clustering.