{"title":"An improved genetic algorithm for robust fuzzy clustering with unknown number of clusters","authors":"A. Banerjee","doi":"10.1109/NAFIPS.2010.5548175","DOIUrl":null,"url":null,"abstract":"In this paper the problem of partitioning noisy data when the number of clusters c is not known a priori is revisited. The methodology proposed is a population-based search in the partition space using a genetic algorithm. Potential solutions are represented using a two-part representation scheme, where the first part of the chromosome represents the classification of the data into true (retained) and outlier (trimmed) sets, and the second part is the result of a partition on the true set for a particular value of c, which is simultaneously optimized by the process. A two-tier fitness function is also proposed in this paper, one which first assesses potential solutions on the basis of a test of clustering tendency on the retained set, and later on the efficacy of the partition for a given value of c. A mating pool is created out of highly successful individuals from the test of clustering tendency and allowed to crossover and produce offspring solutions which inherit the better partition from either of its parents. The proposed methodology is an improvement over a multi-objective genetic algorithm-based clustering technique, which previously was shown to be superior (or at least comparable) to robust clustering methods that assume a known value of c.","PeriodicalId":394892,"journal":{"name":"2010 Annual Meeting of the North American Fuzzy Information Processing Society","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Annual Meeting of the North American Fuzzy Information Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAFIPS.2010.5548175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8
Abstract
In this paper the problem of partitioning noisy data when the number of clusters c is not known a priori is revisited. The methodology proposed is a population-based search in the partition space using a genetic algorithm. Potential solutions are represented using a two-part representation scheme, where the first part of the chromosome represents the classification of the data into true (retained) and outlier (trimmed) sets, and the second part is the result of a partition on the true set for a particular value of c, which is simultaneously optimized by the process. A two-tier fitness function is also proposed in this paper, one which first assesses potential solutions on the basis of a test of clustering tendency on the retained set, and later on the efficacy of the partition for a given value of c. A mating pool is created out of highly successful individuals from the test of clustering tendency and allowed to crossover and produce offspring solutions which inherit the better partition from either of its parents. The proposed methodology is an improvement over a multi-objective genetic algorithm-based clustering technique, which previously was shown to be superior (or at least comparable) to robust clustering methods that assume a known value of c.