层次聚类阈值的学习

2006 5th International Conference on Machine Learning and Applications (ICMLA'06) Pub Date : 2006-12-14 DOI:10.1109/ICMLA.2006.33

K. Daniels, C. Giraud-Carrier

{"title":"层次聚类阈值的学习","authors":"K. Daniels, C. Giraud-Carrier","doi":"10.1109/ICMLA.2006.33","DOIUrl":null,"url":null,"abstract":"Most partitional clustering algorithms require the number of desired clusters to be set a priori. Not only is this somewhat counter-intuitive, it is also difficult except in the simplest of situations. By contrast, hierarchical clustering may create partitions with varying numbers of clusters. The actual final partition depends on a threshold placed on the similarity measure used. Given a cluster quality metric, one can efficiently discover an appropriate threshold through a form of semi-supervised learning. This paper shows one such solution for complete-link hierarchical agglomerative clustering using the F-measure and a small subset of labeled examples. Empirical evaluation demonstrates promise","PeriodicalId":297071,"journal":{"name":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-12-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"23","resultStr":"{\"title\":\"Learning the Threshold in Hierarchical Agglomerative Clustering\",\"authors\":\"K. Daniels, C. Giraud-Carrier\",\"doi\":\"10.1109/ICMLA.2006.33\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Most partitional clustering algorithms require the number of desired clusters to be set a priori. Not only is this somewhat counter-intuitive, it is also difficult except in the simplest of situations. By contrast, hierarchical clustering may create partitions with varying numbers of clusters. The actual final partition depends on a threshold placed on the similarity measure used. Given a cluster quality metric, one can efficiently discover an appropriate threshold through a form of semi-supervised learning. This paper shows one such solution for complete-link hierarchical agglomerative clustering using the F-measure and a small subset of labeled examples. Empirical evaluation demonstrates promise\",\"PeriodicalId\":297071,\"journal\":{\"name\":\"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)\",\"volume\":\"36 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-12-14\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"23\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICMLA.2006.33\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 5th International Conference on Machine Learning and Applications (ICMLA'06)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLA.2006.33","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 23

摘要

大多数分区聚类算法需要先验地设置所需聚类的数量。这不仅有点违反直觉，而且除了在最简单的情况下，它也很困难。相比之下，分层集群可以创建具有不同数量集群的分区。实际的最终分区取决于所使用的相似性度量的阈值。给定一个聚类质量度量，人们可以通过半监督学习的形式有效地发现一个适当的阈值。本文给出了一种利用f测度和标记样本的小子集的完全链接层次聚集聚类的解决方案。实证评价表明前景看好

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning the Threshold in Hierarchical Agglomerative Clustering

Most partitional clustering algorithms require the number of desired clusters to be set a priori. Not only is this somewhat counter-intuitive, it is also difficult except in the simplest of situations. By contrast, hierarchical clustering may create partitions with varying numbers of clusters. The actual final partition depends on a threshold placed on the similarity measure used. Given a cluster quality metric, one can efficiently discover an appropriate threshold through a form of semi-supervised learning. This paper shows one such solution for complete-link hierarchical agglomerative clustering using the F-measure and a small subset of labeled examples. Empirical evaluation demonstrates promise

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2006 5th International Conference on Machine Learning and Applications (ICMLA'06)

自引率

0.00%

发文量