{"title":"基于离群点检测的k均值聚类初始种子选择","authors":"Zhiyong Yang, Feng Jiang, J. Yu, Junwei Du","doi":"10.1145/3520084.3520106","DOIUrl":null,"url":null,"abstract":"K-means clustering is a widely used algorithm in cluster analysis. However, the selection of initial seeds determines the results of K-means clustering. The conventional K-means algorithm usually adopts the random strategy to select initial seeds, which is unable to generate an ideal clustering result in many cases. To solve the problem of the existing initial seeds selection (abbreviated to ISS) strategies for K-means clustering, we propose a novel initial seeds selection algorithm, called ISS_OD, based on outlier detection. In ISS_OD, we select the initial seeds of K-means clustering by calculating the distance outlier factor of every object, the weighted density of every object and the weighted distances between objects. Experimental results on several UCI datasets demonstrate the effectiveness of our algorithm for the ISS of K-means clustering.","PeriodicalId":444957,"journal":{"name":"Proceedings of the 2022 5th International Conference on Software Engineering and Information Management","volume":"22 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Initial Seeds Selection for K-means Clustering Based on Outlier Detection\",\"authors\":\"Zhiyong Yang, Feng Jiang, J. Yu, Junwei Du\",\"doi\":\"10.1145/3520084.3520106\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"K-means clustering is a widely used algorithm in cluster analysis. However, the selection of initial seeds determines the results of K-means clustering. The conventional K-means algorithm usually adopts the random strategy to select initial seeds, which is unable to generate an ideal clustering result in many cases. To solve the problem of the existing initial seeds selection (abbreviated to ISS) strategies for K-means clustering, we propose a novel initial seeds selection algorithm, called ISS_OD, based on outlier detection. In ISS_OD, we select the initial seeds of K-means clustering by calculating the distance outlier factor of every object, the weighted density of every object and the weighted distances between objects. Experimental results on several UCI datasets demonstrate the effectiveness of our algorithm for the ISS of K-means clustering.\",\"PeriodicalId\":444957,\"journal\":{\"name\":\"Proceedings of the 2022 5th International Conference on Software Engineering and Information Management\",\"volume\":\"22 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-21\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2022 5th International Conference on Software Engineering and Information Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3520084.3520106\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2022 5th International Conference on Software Engineering and Information Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3520084.3520106","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
k -均值聚类是聚类分析中应用广泛的一种算法。然而,初始种子的选择决定了K-means聚类的结果。传统的K-means算法通常采用随机策略选择初始种子,在很多情况下无法产生理想的聚类结果。为了解决现有K-means聚类初始种子选择(简称ISS)策略存在的问题,提出了一种基于离群点检测的初始种子选择算法ISS_OD。在ISS_OD中,我们通过计算每个目标的距离离群因子、每个目标的加权密度和目标之间的加权距离来选择K-means聚类的初始种子。在多个UCI数据集上的实验结果证明了该算法对K-means聚类的ISS的有效性。
Initial Seeds Selection for K-means Clustering Based on Outlier Detection
K-means clustering is a widely used algorithm in cluster analysis. However, the selection of initial seeds determines the results of K-means clustering. The conventional K-means algorithm usually adopts the random strategy to select initial seeds, which is unable to generate an ideal clustering result in many cases. To solve the problem of the existing initial seeds selection (abbreviated to ISS) strategies for K-means clustering, we propose a novel initial seeds selection algorithm, called ISS_OD, based on outlier detection. In ISS_OD, we select the initial seeds of K-means clustering by calculating the distance outlier factor of every object, the weighted density of every object and the weighted distances between objects. Experimental results on several UCI datasets demonstrate the effectiveness of our algorithm for the ISS of K-means clustering.