基于顺序聚类算法的聚类数量估计

2016 5th Brazilian Conference on Intelligent Systems (BRACIS) Pub Date : 2016-10-01 DOI:10.1109/BRACIS.2016.050

E. M. Real

{"title":"基于顺序聚类算法的聚类数量估计","authors":"E. M. Real","doi":"10.1109/BRACIS.2016.050","DOIUrl":null,"url":null,"abstract":"The main goal of clustering algorithms is to organize a given set of data patterns into groups (clusters) and their main strategy is to group patterns based on their similarity. However, some clustering algorithms also require as an input parameter, the number of clusters the induced clustering should have, or then, a threshold value used for limiting for the number of induced clusters. Both, the number of cluster as well a threshold value are often unknown, however it is well-known that results of clustering tasks can be very sensitive to them. This work presents a method for empirically estimating both values. The method is based on multiple runs of sequential clustering algorithms, by using increasing threshold values. Results from experiments conducted using several data domains from two repositories, the UCI and the Keel, as well as a few artificially created data, are presented and a comparative analysis is carried out, as evidence of the good estimates on both values given by the method.","PeriodicalId":183149,"journal":{"name":"2016 5th Brazilian Conference on Intelligent Systems (BRACIS)","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Estimating the Number of Clusters Based on Sequential Clustering Algorithms\",\"authors\":\"E. M. Real\",\"doi\":\"10.1109/BRACIS.2016.050\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The main goal of clustering algorithms is to organize a given set of data patterns into groups (clusters) and their main strategy is to group patterns based on their similarity. However, some clustering algorithms also require as an input parameter, the number of clusters the induced clustering should have, or then, a threshold value used for limiting for the number of induced clusters. Both, the number of cluster as well a threshold value are often unknown, however it is well-known that results of clustering tasks can be very sensitive to them. This work presents a method for empirically estimating both values. The method is based on multiple runs of sequential clustering algorithms, by using increasing threshold values. Results from experiments conducted using several data domains from two repositories, the UCI and the Keel, as well as a few artificially created data, are presented and a comparative analysis is carried out, as evidence of the good estimates on both values given by the method.\",\"PeriodicalId\":183149,\"journal\":{\"name\":\"2016 5th Brazilian Conference on Intelligent Systems (BRACIS)\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 5th Brazilian Conference on Intelligent Systems (BRACIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/BRACIS.2016.050\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 5th Brazilian Conference on Intelligent Systems (BRACIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/BRACIS.2016.050","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

聚类算法的主要目标是将一组给定的数据模式组织成组(簇)，它们的主要策略是根据相似度对模式进行分组。然而，一些聚类算法还要求作为输入参数，诱导聚类应该具有的聚类数量，或者然后，用于限制诱导聚类数量的阈值。聚类的数量和阈值通常都是未知的，但是众所周知，聚类任务的结果可能对它们非常敏感。这项工作提出了一种经验估计这两个值的方法。该方法是基于多次运行的顺序聚类算法，通过增加阈值。使用来自两个存储库(UCI和Keel)的几个数据域以及一些人工创建的数据进行的实验结果，并进行了比较分析，作为该方法给出的两个值的良好估计的证据。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Estimating the Number of Clusters Based on Sequential Clustering Algorithms

The main goal of clustering algorithms is to organize a given set of data patterns into groups (clusters) and their main strategy is to group patterns based on their similarity. However, some clustering algorithms also require as an input parameter, the number of clusters the induced clustering should have, or then, a threshold value used for limiting for the number of induced clusters. Both, the number of cluster as well a threshold value are often unknown, however it is well-known that results of clustering tasks can be very sensitive to them. This work presents a method for empirically estimating both values. The method is based on multiple runs of sequential clustering algorithms, by using increasing threshold values. Results from experiments conducted using several data domains from two repositories, the UCI and the Keel, as well as a few artificially created data, are presented and a comparative analysis is carried out, as evidence of the good estimates on both values given by the method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 5th Brazilian Conference on Intelligent Systems (BRACIS)

自引率

0.00%

发文量