mmkk++算法用于将异构图像聚类到未知数量的聚类中

Q4 Computer Science

Electronic Letters on Computer Vision and Image Analysis Pub Date : 2018-01-10 DOI:10.5565/REV/ELCVIA.1054

Dávid Papp, G. Szűcs

{"title":"mmkk++算法用于将异构图像聚类到未知数量的聚类中","authors":"Dávid Papp, G. Szűcs","doi":"10.5565/REV/ELCVIA.1054","DOIUrl":null,"url":null,"abstract":"In this paper we present an automatic clustering procedure with the main aim to predict the number of clusters of unknown, heterogeneous images. We used the Fisher-vector for mathematical representation of the images and these vectors were considered as input data points for the clustering algorithm. We implemented a novel variant of K-means, the kernel K-means++, furthermore the min-max kernel K-means plusplus (MMKK++) as clustering method. The proposed approach examines some candidate cluster numbers and determines the strength of the clustering to estimate how well the data fit into K clusters, as well as the law of large numbers was used in order to choose the optimal cluster size. We conducted experiments on four image sets to demonstrate the efficiency of our solution. The first two image sets are subsets of different popular collections; the third is their union; the fourth is the complete Caltech101 image set. The result showed that our approach was able to give a better estimation for the number of clusters than the competitor methods. Furthermore, we defined two new metrics for evaluation of predicting the appropriate cluster number, which are capable of measuring the goodness in a more sophisticated way, instead of binary evaluation.","PeriodicalId":38711,"journal":{"name":"Electronic Letters on Computer Vision and Image Analysis","volume":"9 1","pages":"30-45"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"MMKK++ algorithm for clustering heterogeneous images into an unknown number of clusters\",\"authors\":\"Dávid Papp, G. Szűcs\",\"doi\":\"10.5565/REV/ELCVIA.1054\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we present an automatic clustering procedure with the main aim to predict the number of clusters of unknown, heterogeneous images. We used the Fisher-vector for mathematical representation of the images and these vectors were considered as input data points for the clustering algorithm. We implemented a novel variant of K-means, the kernel K-means++, furthermore the min-max kernel K-means plusplus (MMKK++) as clustering method. The proposed approach examines some candidate cluster numbers and determines the strength of the clustering to estimate how well the data fit into K clusters, as well as the law of large numbers was used in order to choose the optimal cluster size. We conducted experiments on four image sets to demonstrate the efficiency of our solution. The first two image sets are subsets of different popular collections; the third is their union; the fourth is the complete Caltech101 image set. The result showed that our approach was able to give a better estimation for the number of clusters than the competitor methods. Furthermore, we defined two new metrics for evaluation of predicting the appropriate cluster number, which are capable of measuring the goodness in a more sophisticated way, instead of binary evaluation.\",\"PeriodicalId\":38711,\"journal\":{\"name\":\"Electronic Letters on Computer Vision and Image Analysis\",\"volume\":\"9 1\",\"pages\":\"30-45\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-01-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Electronic Letters on Computer Vision and Image Analysis\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5565/REV/ELCVIA.1054\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"Computer Science\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Electronic Letters on Computer Vision and Image Analysis","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5565/REV/ELCVIA.1054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 4

摘要

在本文中，我们提出了一个自动聚类过程，其主要目的是预测未知的异构图像的聚类数量。我们使用fisher向量对图像进行数学表示，这些向量被认为是聚类算法的输入数据点。我们实现了K-means的一种新变体，即核K-means++，并进一步实现了最小-最大核K-means++ (mmkk++)作为聚类方法。所提出的方法检查一些候选簇数，并确定聚类的强度，以估计数据适合K个簇的程度，以及使用大数定律来选择最佳簇大小。我们在四个图像集上进行了实验，以证明我们的解决方案的效率。前两个图像集是不同流行集合的子集;第三是他们的结合;第四是完整的Caltech101图像集。结果表明，我们的方法能够比竞争对手的方法给出更好的聚类数量估计。此外，我们定义了两个新的指标来评估预测适当的聚类数，它们能够以更复杂的方式衡量优度，而不是二元评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

MMKK++ algorithm for clustering heterogeneous images into an unknown number of clusters

In this paper we present an automatic clustering procedure with the main aim to predict the number of clusters of unknown, heterogeneous images. We used the Fisher-vector for mathematical representation of the images and these vectors were considered as input data points for the clustering algorithm. We implemented a novel variant of K-means, the kernel K-means++, furthermore the min-max kernel K-means plusplus (MMKK++) as clustering method. The proposed approach examines some candidate cluster numbers and determines the strength of the clustering to estimate how well the data fit into K clusters, as well as the law of large numbers was used in order to choose the optimal cluster size. We conducted experiments on four image sets to demonstrate the efficiency of our solution. The first two image sets are subsets of different popular collections; the third is their union; the fourth is the complete Caltech101 image set. The result showed that our approach was able to give a better estimation for the number of clusters than the competitor methods. Furthermore, we defined two new metrics for evaluation of predicting the appropriate cluster number, which are capable of measuring the goodness in a more sophisticated way, instead of binary evaluation.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Electronic Letters on Computer Vision and Image Analysis Computer Science-Computer Vision and Pattern Recognition

CiteScore

2.50

自引率

0.00%

发文量

审稿时长

12 weeks