Internal evaluation criteria for categorical data in hierarchical clustering

Z. Šulc, J. Cibulková, J. Procházka, H. Řezanková
{"title":"Internal evaluation criteria for categorical data in hierarchical clustering","authors":"Z. Šulc, J. Cibulková, J. Procházka, H. Řezanková","doi":"10.51936/lxut1974","DOIUrl":null,"url":null,"abstract":"The paper compares 11 internal evaluation criteria for hierarchical clustering of categorical data regarding a correct number of clusters determination. The criteria are divided into three groups based on a way of treating the cluster quality. The variability-based criteria use the within-cluster variability, the likelihood-based criteria maximize the likelihood function, and the distance-based criteria use distances within and between clusters. The aim is to determine which evaluation criteria perform well and under what conditions. Different analysis settings, such as the used method of hierarchical clustering, and various dataset properties, such as the number of variables or the minimal between-cluster distances, are examined. The experiment is conducted on 810 generated datasets, where the evaluation criteria are assessed regarding the optimal number of clusters determination and mean absolute errors. The results indicate that the likelihood-based BIC1 and variability-based BK criteria perform relatively well in determining the optimal number of clusters and that some criteria, usually the distance-based ones, should be avoided.","PeriodicalId":242585,"journal":{"name":"Advances in Methodology and Statistics","volume":"164 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Advances in Methodology and Statistics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.51936/lxut1974","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

The paper compares 11 internal evaluation criteria for hierarchical clustering of categorical data regarding a correct number of clusters determination. The criteria are divided into three groups based on a way of treating the cluster quality. The variability-based criteria use the within-cluster variability, the likelihood-based criteria maximize the likelihood function, and the distance-based criteria use distances within and between clusters. The aim is to determine which evaluation criteria perform well and under what conditions. Different analysis settings, such as the used method of hierarchical clustering, and various dataset properties, such as the number of variables or the minimal between-cluster distances, are examined. The experiment is conducted on 810 generated datasets, where the evaluation criteria are assessed regarding the optimal number of clusters determination and mean absolute errors. The results indicate that the likelihood-based BIC1 and variability-based BK criteria perform relatively well in determining the optimal number of clusters and that some criteria, usually the distance-based ones, should be avoided.
层次聚类中分类数据的内部评价标准
本文比较了分类数据分层聚类的11个内部评价标准,以确定正确的聚类数量。根据处理聚类质量的方法,将标准分为三组。基于变量的标准使用集群内的可变性,基于似然的标准最大化似然函数,基于距离的标准使用集群内和集群之间的距离。目的是确定哪些评价标准在什么条件下表现良好。研究了不同的分析设置,例如使用的分层聚类方法,以及各种数据集属性,例如变量数量或最小簇间距离。实验在810个生成的数据集上进行,对最优聚类数的确定和平均绝对误差进行评估。结果表明,基于似然的BIC1和基于变异性的BK标准在确定最优聚类数量方面表现相对较好,并且应该避免一些标准,通常是基于距离的标准。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信