{"title":"大型数据集聚类倾向的核外评估","authors":"M. K. Pakhira","doi":"10.1109/IADCC.2010.5423044","DOIUrl":null,"url":null,"abstract":"Determining the number of clusters present in a data set automatically is a very important problem. Conventional clustering techniques assume a certain number of clusters, and then try to find out the possible cluster structure associated to the above number. For very large and complex data sets it is not easy to guess this number of clusters. There exists validity based clustering techniques, which measure a certain cluster validity measure of a certain clustering result by varying the number of clusters. After doing this for a broad range of possible number of clusters, this method selects the number for which the validity measure is optimum. This method is, however, awkward and may not always be applicable for very large data sets. Recently an interesting visual technique for determining clustering tendency has been developed. This new technique is called VAT in abbreviation. The original VAT and its different versions are found to determine the number of clusters, before actually applying any clustering algorithm, very satisfactorily. In this paper, we have proposed an out-of-core VAT algorithm (o-VAT) for very large data sets.","PeriodicalId":249763,"journal":{"name":"2010 IEEE 2nd International Advance Computing Conference (IACC)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Out-of-core assessment of clustering tendency for large data sets\",\"authors\":\"M. K. Pakhira\",\"doi\":\"10.1109/IADCC.2010.5423044\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Determining the number of clusters present in a data set automatically is a very important problem. Conventional clustering techniques assume a certain number of clusters, and then try to find out the possible cluster structure associated to the above number. For very large and complex data sets it is not easy to guess this number of clusters. There exists validity based clustering techniques, which measure a certain cluster validity measure of a certain clustering result by varying the number of clusters. After doing this for a broad range of possible number of clusters, this method selects the number for which the validity measure is optimum. This method is, however, awkward and may not always be applicable for very large data sets. Recently an interesting visual technique for determining clustering tendency has been developed. This new technique is called VAT in abbreviation. The original VAT and its different versions are found to determine the number of clusters, before actually applying any clustering algorithm, very satisfactorily. In this paper, we have proposed an out-of-core VAT algorithm (o-VAT) for very large data sets.\",\"PeriodicalId\":249763,\"journal\":{\"name\":\"2010 IEEE 2nd International Advance Computing Conference (IACC)\",\"volume\":\"89 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 IEEE 2nd International Advance Computing Conference (IACC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IADCC.2010.5423044\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 IEEE 2nd International Advance Computing Conference (IACC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IADCC.2010.5423044","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Out-of-core assessment of clustering tendency for large data sets
Determining the number of clusters present in a data set automatically is a very important problem. Conventional clustering techniques assume a certain number of clusters, and then try to find out the possible cluster structure associated to the above number. For very large and complex data sets it is not easy to guess this number of clusters. There exists validity based clustering techniques, which measure a certain cluster validity measure of a certain clustering result by varying the number of clusters. After doing this for a broad range of possible number of clusters, this method selects the number for which the validity measure is optimum. This method is, however, awkward and may not always be applicable for very large data sets. Recently an interesting visual technique for determining clustering tendency has been developed. This new technique is called VAT in abbreviation. The original VAT and its different versions are found to determine the number of clusters, before actually applying any clustering algorithm, very satisfactorily. In this paper, we have proposed an out-of-core VAT algorithm (o-VAT) for very large data sets.