Pairwise Constrained Clustering with Group Similarity-Based Patterns

Tianming Hu, Chuanren Liu, Jing Sun, S. Sung, P. Ng
{"title":"Pairwise Constrained Clustering with Group Similarity-Based Patterns","authors":"Tianming Hu, Chuanren Liu, Jing Sun, S. Sung, P. Ng","doi":"10.1109/ICMLA.2010.45","DOIUrl":null,"url":null,"abstract":"Conventional k-means only considers pair wise similarity during cluster assignment, which aims to minimizing the distance of points to their nearest cluster centroids. In high dimensional space like document datasets, however, two points may be nearest neighbors without belonging to the same class. Thus pair wise similarity alone is often insufficient for class prediction in such space. To that end, in this paper, we propose to augment k-means with pair wise constraints generated from group similarity-based hyper clique patterns, which consist of strongly affiliated objects and serve as more reliable seeds for classification. Experiments with real-world datasets show that, with such constraints from quality hyper clique patterns, we can improve the clustering results in terms of various external criteria. Also, our experiments indicate that even if few constraints are violated in the original result of k-means, imposing many quality constraints may still bring gain of performance.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Ninth International Conference on Machine Learning and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2010.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Conventional k-means only considers pair wise similarity during cluster assignment, which aims to minimizing the distance of points to their nearest cluster centroids. In high dimensional space like document datasets, however, two points may be nearest neighbors without belonging to the same class. Thus pair wise similarity alone is often insufficient for class prediction in such space. To that end, in this paper, we propose to augment k-means with pair wise constraints generated from group similarity-based hyper clique patterns, which consist of strongly affiliated objects and serve as more reliable seeds for classification. Experiments with real-world datasets show that, with such constraints from quality hyper clique patterns, we can improve the clustering results in terms of various external criteria. Also, our experiments indicate that even if few constraints are violated in the original result of k-means, imposing many quality constraints may still bring gain of performance.
基于组相似度模式的成对约束聚类
传统的k-means在聚类分配过程中只考虑对相似性,其目的是最小化点到最近的聚类质心的距离。然而,在像文档数据集这样的高维空间中,两个点可能是最近的邻居,但不属于同一类。因此,在这样的空间中,单靠对相似度通常不足以进行类预测。为此,在本文中,我们建议用基于组相似性的超团模式生成的对约束来增强k-means,这些模式由强关联对象组成,并作为更可靠的分类种子。对真实数据集的实验表明,在高质量超团模式的约束下,我们可以根据各种外部标准改进聚类结果。此外,我们的实验表明,即使在k-means的原始结果中很少违反约束,施加许多质量约束仍然可以带来性能的增益。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信