{"title":"基于k均值算法的高精度聚类数据选择框架","authors":"Zhengzheng Lou, Chaoyang Zhang","doi":"10.1109/FSKD.2017.8393013","DOIUrl":null,"url":null,"abstract":"Traditional clustering algorithms employ all the data items to learn the cluster patterns. However, in real-world applications, some data show clear coherent behaviour and can be summarized well, while some data present weak tendencies to be assigned to any particular pattern. For such situation, this paper presents a data selection framework for K-Means algorithm to get high precision clusters from the data collection. It differs from traditional k-means-type algorithms in three respects. First, in the cluster learning process, we take the changed value of cluster's Bregman Information, which is generated by merging one data item into the potential clusters, as the measure of data item's clustering tendency. Second, only data items with strong clustering tendencies, that is the changed value of cluster's Bregman Information is less than the predefined radius, are selected to learn the cluster patterns, while the remaining data points are ignored and belong to no cluster. The clustering is non-exhaustive. Third, the radius of the clusters can be changed in the learning process. It is a dynamic learning framework. Experiments on synthetic, document and image data show the effectiveness of the proposed algorithm.","PeriodicalId":236093,"journal":{"name":"2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","volume":"334 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"A data selection framework for k-means algorithm to mine high precision clusters\",\"authors\":\"Zhengzheng Lou, Chaoyang Zhang\",\"doi\":\"10.1109/FSKD.2017.8393013\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Traditional clustering algorithms employ all the data items to learn the cluster patterns. However, in real-world applications, some data show clear coherent behaviour and can be summarized well, while some data present weak tendencies to be assigned to any particular pattern. For such situation, this paper presents a data selection framework for K-Means algorithm to get high precision clusters from the data collection. It differs from traditional k-means-type algorithms in three respects. First, in the cluster learning process, we take the changed value of cluster's Bregman Information, which is generated by merging one data item into the potential clusters, as the measure of data item's clustering tendency. Second, only data items with strong clustering tendencies, that is the changed value of cluster's Bregman Information is less than the predefined radius, are selected to learn the cluster patterns, while the remaining data points are ignored and belong to no cluster. The clustering is non-exhaustive. Third, the radius of the clusters can be changed in the learning process. It is a dynamic learning framework. Experiments on synthetic, document and image data show the effectiveness of the proposed algorithm.\",\"PeriodicalId\":236093,\"journal\":{\"name\":\"2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)\",\"volume\":\"334 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/FSKD.2017.8393013\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 13th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FSKD.2017.8393013","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A data selection framework for k-means algorithm to mine high precision clusters
Traditional clustering algorithms employ all the data items to learn the cluster patterns. However, in real-world applications, some data show clear coherent behaviour and can be summarized well, while some data present weak tendencies to be assigned to any particular pattern. For such situation, this paper presents a data selection framework for K-Means algorithm to get high precision clusters from the data collection. It differs from traditional k-means-type algorithms in three respects. First, in the cluster learning process, we take the changed value of cluster's Bregman Information, which is generated by merging one data item into the potential clusters, as the measure of data item's clustering tendency. Second, only data items with strong clustering tendencies, that is the changed value of cluster's Bregman Information is less than the predefined radius, are selected to learn the cluster patterns, while the remaining data points are ignored and belong to no cluster. The clustering is non-exhaustive. Third, the radius of the clusters can be changed in the learning process. It is a dynamic learning framework. Experiments on synthetic, document and image data show the effectiveness of the proposed algorithm.