Tianming Hu, Chuanren Liu, Jing Sun, S. Sung, P. Ng
{"title":"Pairwise Constrained Clustering with Group Similarity-Based Patterns","authors":"Tianming Hu, Chuanren Liu, Jing Sun, S. Sung, P. Ng","doi":"10.1109/ICMLA.2010.45","DOIUrl":null,"url":null,"abstract":"Conventional k-means only considers pair wise similarity during cluster assignment, which aims to minimizing the distance of points to their nearest cluster centroids. In high dimensional space like document datasets, however, two points may be nearest neighbors without belonging to the same class. Thus pair wise similarity alone is often insufficient for class prediction in such space. To that end, in this paper, we propose to augment k-means with pair wise constraints generated from group similarity-based hyper clique patterns, which consist of strongly affiliated objects and serve as more reliable seeds for classification. Experiments with real-world datasets show that, with such constraints from quality hyper clique patterns, we can improve the clustering results in terms of various external criteria. Also, our experiments indicate that even if few constraints are violated in the original result of k-means, imposing many quality constraints may still bring gain of performance.","PeriodicalId":336514,"journal":{"name":"2010 Ninth International Conference on Machine Learning and Applications","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-12-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Ninth International Conference on Machine Learning and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2010.45","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Conventional k-means only considers pair wise similarity during cluster assignment, which aims to minimizing the distance of points to their nearest cluster centroids. In high dimensional space like document datasets, however, two points may be nearest neighbors without belonging to the same class. Thus pair wise similarity alone is often insufficient for class prediction in such space. To that end, in this paper, we propose to augment k-means with pair wise constraints generated from group similarity-based hyper clique patterns, which consist of strongly affiliated objects and serve as more reliable seeds for classification. Experiments with real-world datasets show that, with such constraints from quality hyper clique patterns, we can improve the clustering results in terms of various external criteria. Also, our experiments indicate that even if few constraints are violated in the original result of k-means, imposing many quality constraints may still bring gain of performance.