Out-of-core assessment of clustering tendency for large data sets

M. K. Pakhira
{"title":"Out-of-core assessment of clustering tendency for large data sets","authors":"M. K. Pakhira","doi":"10.1109/IADCC.2010.5423044","DOIUrl":null,"url":null,"abstract":"Determining the number of clusters present in a data set automatically is a very important problem. Conventional clustering techniques assume a certain number of clusters, and then try to find out the possible cluster structure associated to the above number. For very large and complex data sets it is not easy to guess this number of clusters. There exists validity based clustering techniques, which measure a certain cluster validity measure of a certain clustering result by varying the number of clusters. After doing this for a broad range of possible number of clusters, this method selects the number for which the validity measure is optimum. This method is, however, awkward and may not always be applicable for very large data sets. Recently an interesting visual technique for determining clustering tendency has been developed. This new technique is called VAT in abbreviation. The original VAT and its different versions are found to determine the number of clusters, before actually applying any clustering algorithm, very satisfactorily. In this paper, we have proposed an out-of-core VAT algorithm (o-VAT) for very large data sets.","PeriodicalId":249763,"journal":{"name":"2010 IEEE 2nd International Advance Computing Conference (IACC)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 2nd International Advance Computing Conference (IACC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IADCC.2010.5423044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Determining the number of clusters present in a data set automatically is a very important problem. Conventional clustering techniques assume a certain number of clusters, and then try to find out the possible cluster structure associated to the above number. For very large and complex data sets it is not easy to guess this number of clusters. There exists validity based clustering techniques, which measure a certain cluster validity measure of a certain clustering result by varying the number of clusters. After doing this for a broad range of possible number of clusters, this method selects the number for which the validity measure is optimum. This method is, however, awkward and may not always be applicable for very large data sets. Recently an interesting visual technique for determining clustering tendency has been developed. This new technique is called VAT in abbreviation. The original VAT and its different versions are found to determine the number of clusters, before actually applying any clustering algorithm, very satisfactorily. In this paper, we have proposed an out-of-core VAT algorithm (o-VAT) for very large data sets.
大型数据集聚类倾向的核外评估
自动确定数据集中存在的簇的数量是一个非常重要的问题。传统的聚类技术假设一定数量的聚类,然后试图找出与上述数量相关的可能的聚类结构。对于非常庞大和复杂的数据集,要猜出集群的数量并不容易。现有的基于效度的聚类技术是通过改变聚类的数量来度量某一聚类结果的某一聚类效度。在对可能数量的集群进行此操作后,该方法选择有效性度量最优的数量。然而,这种方法很笨拙,可能并不总是适用于非常大的数据集。最近,一种有趣的用于确定聚类趋势的视觉技术被开发出来。这项新技术简称增值税。在实际应用任何聚类算法之前,发现原始增值税及其不同版本确定了群集的数量,非常令人满意。在本文中,我们提出了一个非常大的数据集的核心外增值税算法(o-VAT)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信