有多少集群存在?通过在R中实现的最大聚类相似性来回答

Q3 Medicine
A. Albatineh, M. Wilcox, B. Zogheib, M. Niewiadomska-Bugaj
{"title":"有多少集群存在?通过在R中实现的最大聚类相似性来回答","authors":"A. Albatineh, M. Wilcox, B. Zogheib, M. Niewiadomska-Bugaj","doi":"10.1080/24709360.2019.1615770","DOIUrl":null,"url":null,"abstract":"Finding the number of clusters in a data set is considered as one of the fundamental problems in cluster analysis. This paper integrates maximum clustering similarity (MCS), for finding the optimal number of clusters, into R statistical software through the package MCSim. The similarity between the two clustering methods is calculated at the same number of clusters, using Rand [Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66:846–850.] and Jaccard [The distribution of the flora of the alpine zone. New Phytologist. 1912;11:37–50.] indices, corrected for chance agreement. The number of clusters at which the index attains its maximum with most frequency is a candidate for the optimal number of clusters. Unlike other criteria, MCS can be used with circular data. Seven clustering algorithms, existing in R, are implemented in MCSim. A graph of the number of clusters vs. clusters similarity using corrected similarity indices is produced. Values of the similarity indices and a clustering tree (dendrogram) are produced. Several examples including simulated, real, and circular data sets are presented to show how MCSim successfully works in practice.","PeriodicalId":37240,"journal":{"name":"Biostatistics and Epidemiology","volume":"3 1","pages":"62 - 79"},"PeriodicalIF":0.0000,"publicationDate":"2019-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1080/24709360.2019.1615770","citationCount":"0","resultStr":"{\"title\":\"How many clusters exist? Answer via maximum clustering similarity implemented in R\",\"authors\":\"A. Albatineh, M. Wilcox, B. Zogheib, M. Niewiadomska-Bugaj\",\"doi\":\"10.1080/24709360.2019.1615770\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Finding the number of clusters in a data set is considered as one of the fundamental problems in cluster analysis. This paper integrates maximum clustering similarity (MCS), for finding the optimal number of clusters, into R statistical software through the package MCSim. The similarity between the two clustering methods is calculated at the same number of clusters, using Rand [Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66:846–850.] and Jaccard [The distribution of the flora of the alpine zone. New Phytologist. 1912;11:37–50.] indices, corrected for chance agreement. The number of clusters at which the index attains its maximum with most frequency is a candidate for the optimal number of clusters. Unlike other criteria, MCS can be used with circular data. Seven clustering algorithms, existing in R, are implemented in MCSim. A graph of the number of clusters vs. clusters similarity using corrected similarity indices is produced. Values of the similarity indices and a clustering tree (dendrogram) are produced. Several examples including simulated, real, and circular data sets are presented to show how MCSim successfully works in practice.\",\"PeriodicalId\":37240,\"journal\":{\"name\":\"Biostatistics and Epidemiology\",\"volume\":\"3 1\",\"pages\":\"62 - 79\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1080/24709360.2019.1615770\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biostatistics and Epidemiology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/24709360.2019.1615770\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"Medicine\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biostatistics and Epidemiology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/24709360.2019.1615770","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"Medicine","Score":null,"Total":0}
引用次数: 0

摘要

找出数据集中的聚类数量被认为是聚类分析的基本问题之一。本文通过MCSim软件包将最大聚类相似性(MCS)集成到R统计软件中,以寻找最优聚类数。两种聚类方法之间的相似性是在相同数量的聚类下计算的,使用Rand[聚类方法评估的客观标准。J Am Stat Assoc.1971;66:846–850.]和Jaccard[高山区植物群的分布。新植物学家。1912;11:37–50.]指数,对偶然一致性进行校正。指数以最高频率达到最大值的聚类数量是最优聚类数量的候选者。与其他标准不同,MCS可用于循环数据。在MCSim中实现了R中存在的七种聚类算法。使用校正的相似性指数生成聚类数量与聚类相似性的关系图。生成相似性指数的值和聚类树(树状图)。给出了几个例子,包括模拟、真实和循环数据集,以展示MCSim是如何在实践中成功工作的。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
How many clusters exist? Answer via maximum clustering similarity implemented in R
Finding the number of clusters in a data set is considered as one of the fundamental problems in cluster analysis. This paper integrates maximum clustering similarity (MCS), for finding the optimal number of clusters, into R statistical software through the package MCSim. The similarity between the two clustering methods is calculated at the same number of clusters, using Rand [Objective criteria for the evaluation of clustering methods. J Am Stat Assoc. 1971;66:846–850.] and Jaccard [The distribution of the flora of the alpine zone. New Phytologist. 1912;11:37–50.] indices, corrected for chance agreement. The number of clusters at which the index attains its maximum with most frequency is a candidate for the optimal number of clusters. Unlike other criteria, MCS can be used with circular data. Seven clustering algorithms, existing in R, are implemented in MCSim. A graph of the number of clusters vs. clusters similarity using corrected similarity indices is produced. Values of the similarity indices and a clustering tree (dendrogram) are produced. Several examples including simulated, real, and circular data sets are presented to show how MCSim successfully works in practice.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Biostatistics and Epidemiology
Biostatistics and Epidemiology Medicine-Health Informatics
CiteScore
1.80
自引率
0.00%
发文量
23
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信