{"title":"Achieving Natural Clustering by Validating Results of Iterative Evolutionary Clustering Approach","authors":"Tansel Özyer, R. Alhajj","doi":"10.1109/IS.2006.348468","DOIUrl":null,"url":null,"abstract":"Clustering is an essential process that leads to the classification of a given set of instances based on user-specified criteria; and different factors may lead to different clustering results. Thus, a large number of clustering algorithms exist to satisfy different purposes. However, scalability and the fact that algorithms in general need the number of clusters be specified a priori, which is mostly hard to estimate even for domain experts, are two challenges that motivate the development of new algorithms. This paper presents a novel approach to handle these two issues. We mainly developed a clustering method that works as an iterative approach to handle the scalability problem; and we utilize multi-objective genetic algorithm combined with validity indexes to decide on the number of clusters. The basic idea is to partition the dataset first; then cluster each partition separately. Finally, each obtained cluster is treated as a single instance (represented by its centroid) and a conquer process is performed to get the final clustering of the complete dataset. Test results on one large real dataset demonstrate the applicability and effectiveness of the proposed approach","PeriodicalId":116809,"journal":{"name":"2006 3rd International IEEE Conference Intelligent Systems","volume":"45 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"20","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2006 3rd International IEEE Conference Intelligent Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IS.2006.348468","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 20
Abstract
Clustering is an essential process that leads to the classification of a given set of instances based on user-specified criteria; and different factors may lead to different clustering results. Thus, a large number of clustering algorithms exist to satisfy different purposes. However, scalability and the fact that algorithms in general need the number of clusters be specified a priori, which is mostly hard to estimate even for domain experts, are two challenges that motivate the development of new algorithms. This paper presents a novel approach to handle these two issues. We mainly developed a clustering method that works as an iterative approach to handle the scalability problem; and we utilize multi-objective genetic algorithm combined with validity indexes to decide on the number of clusters. The basic idea is to partition the dataset first; then cluster each partition separately. Finally, each obtained cluster is treated as a single instance (represented by its centroid) and a conquer process is performed to get the final clustering of the complete dataset. Test results on one large real dataset demonstrate the applicability and effectiveness of the proposed approach