k -均值近似函数的推广

Junjie Wu, Hui Xiong, Jing Chen, Wenjun Zhou
{"title":"k -均值近似函数的推广","authors":"Junjie Wu, Hui Xiong, Jing Chen, Wenjun Zhou","doi":"10.1109/ICDM.2007.59","DOIUrl":null,"url":null,"abstract":"K-means is a widely used partitional clustering method. A large amount of effort has been made on finding better proximity (distance) functions for k-means. However, the common characteristics of proximity functions remain unknown. To this end, in this paper, we show that all proximity functions that fit k-means clustering can be generalized as k-means distance, which can be derived by a differentiable convex function. A general proof of sufficient and necessary conditions for k-means distance functions is also provided. In addition, we reveal that k-means has a general uniformization effect; that is, k-means tends to produce clusters with relatively balanced cluster sizes. This uniformization effect of k-means exists regardless of proximity functions. Finally, we have conducted extensive experiments on various real-world data sets, and the results show the evidence of the uniformization effect. Also, we observed that external clustering validation measures, such as entropy and variance of information (VI), have difficulty in measuring clustering quality if data have skewed distributions on class sizes.","PeriodicalId":233758,"journal":{"name":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","volume":"210 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-10-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"30","resultStr":"{\"title\":\"A Generalization of Proximity Functions for K-Means\",\"authors\":\"Junjie Wu, Hui Xiong, Jing Chen, Wenjun Zhou\",\"doi\":\"10.1109/ICDM.2007.59\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"K-means is a widely used partitional clustering method. A large amount of effort has been made on finding better proximity (distance) functions for k-means. However, the common characteristics of proximity functions remain unknown. To this end, in this paper, we show that all proximity functions that fit k-means clustering can be generalized as k-means distance, which can be derived by a differentiable convex function. A general proof of sufficient and necessary conditions for k-means distance functions is also provided. In addition, we reveal that k-means has a general uniformization effect; that is, k-means tends to produce clusters with relatively balanced cluster sizes. This uniformization effect of k-means exists regardless of proximity functions. Finally, we have conducted extensive experiments on various real-world data sets, and the results show the evidence of the uniformization effect. Also, we observed that external clustering validation measures, such as entropy and variance of information (VI), have difficulty in measuring clustering quality if data have skewed distributions on class sizes.\",\"PeriodicalId\":233758,\"journal\":{\"name\":\"Seventh IEEE International Conference on Data Mining (ICDM 2007)\",\"volume\":\"210 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2007-10-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"30\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Seventh IEEE International Conference on Data Mining (ICDM 2007)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICDM.2007.59\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Seventh IEEE International Conference on Data Mining (ICDM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDM.2007.59","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 30

摘要

K-means是一种应用广泛的分区聚类方法。为k-means找到更好的接近(距离)函数已经付出了大量的努力。然而,邻近函数的共同特征仍然未知。为此,在本文中,我们证明了所有适合k-means聚类的接近函数都可以推广为k-means距离,k-means距离可以由一个可微凸函数导出。给出了k-均值距离函数的充要条件的一般证明。此外,我们发现k-means具有一般的均匀化效应;也就是说,k-means倾向于产生具有相对平衡的簇大小的簇。无论邻近函数如何,k-means的均匀化效应都存在。最后,我们在各种真实数据集上进行了广泛的实验,结果显示了均匀化效应的证据。此外,我们观察到外部聚类验证度量,如熵和信息方差(VI),难以衡量聚类质量,如果数据在类大小上有倾斜分布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Generalization of Proximity Functions for K-Means
K-means is a widely used partitional clustering method. A large amount of effort has been made on finding better proximity (distance) functions for k-means. However, the common characteristics of proximity functions remain unknown. To this end, in this paper, we show that all proximity functions that fit k-means clustering can be generalized as k-means distance, which can be derived by a differentiable convex function. A general proof of sufficient and necessary conditions for k-means distance functions is also provided. In addition, we reveal that k-means has a general uniformization effect; that is, k-means tends to produce clusters with relatively balanced cluster sizes. This uniformization effect of k-means exists regardless of proximity functions. Finally, we have conducted extensive experiments on various real-world data sets, and the results show the evidence of the uniformization effect. Also, we observed that external clustering validation measures, such as entropy and variance of information (VI), have difficulty in measuring clustering quality if data have skewed distributions on class sizes.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信