未知簇数的鲁棒模糊聚类改进遗传算法

2010 Annual Meeting of the North American Fuzzy Information Processing Society Pub Date : 2010-07-12 DOI:10.1109/NAFIPS.2010.5548175

A. Banerjee

{"title":"未知簇数的鲁棒模糊聚类改进遗传算法","authors":"A. Banerjee","doi":"10.1109/NAFIPS.2010.5548175","DOIUrl":null,"url":null,"abstract":"In this paper the problem of partitioning noisy data when the number of clusters c is not known a priori is revisited. The methodology proposed is a population-based search in the partition space using a genetic algorithm. Potential solutions are represented using a two-part representation scheme, where the first part of the chromosome represents the classification of the data into true (retained) and outlier (trimmed) sets, and the second part is the result of a partition on the true set for a particular value of c, which is simultaneously optimized by the process. A two-tier fitness function is also proposed in this paper, one which first assesses potential solutions on the basis of a test of clustering tendency on the retained set, and later on the efficacy of the partition for a given value of c. A mating pool is created out of highly successful individuals from the test of clustering tendency and allowed to crossover and produce offspring solutions which inherit the better partition from either of its parents. The proposed methodology is an improvement over a multi-objective genetic algorithm-based clustering technique, which previously was shown to be superior (or at least comparable) to robust clustering methods that assume a known value of c.","PeriodicalId":394892,"journal":{"name":"2010 Annual Meeting of the North American Fuzzy Information Processing Society","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"An improved genetic algorithm for robust fuzzy clustering with unknown number of clusters\",\"authors\":\"A. Banerjee\",\"doi\":\"10.1109/NAFIPS.2010.5548175\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper the problem of partitioning noisy data when the number of clusters c is not known a priori is revisited. The methodology proposed is a population-based search in the partition space using a genetic algorithm. Potential solutions are represented using a two-part representation scheme, where the first part of the chromosome represents the classification of the data into true (retained) and outlier (trimmed) sets, and the second part is the result of a partition on the true set for a particular value of c, which is simultaneously optimized by the process. A two-tier fitness function is also proposed in this paper, one which first assesses potential solutions on the basis of a test of clustering tendency on the retained set, and later on the efficacy of the partition for a given value of c. A mating pool is created out of highly successful individuals from the test of clustering tendency and allowed to crossover and produce offspring solutions which inherit the better partition from either of its parents. The proposed methodology is an improvement over a multi-objective genetic algorithm-based clustering technique, which previously was shown to be superior (or at least comparable) to robust clustering methods that assume a known value of c.\",\"PeriodicalId\":394892,\"journal\":{\"name\":\"2010 Annual Meeting of the North American Fuzzy Information Processing Society\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Annual Meeting of the North American Fuzzy Information Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NAFIPS.2010.5548175\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Annual Meeting of the North American Fuzzy Information Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAFIPS.2010.5548175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 8

摘要

本文重新研究了聚类数量c先验未知时的噪声数据划分问题。提出的方法是使用遗传算法在分区空间中进行基于种群的搜索。使用两部分表示方案来表示潜在的解决方案，其中染色体的第一部分表示将数据分类为真集(保留)和离群集(修剪)，第二部分是对特定c值的真集进行划分的结果，该过程同时对其进行优化。本文还提出了一个两层适应度函数，该函数首先根据保留集上的聚类倾向测试来评估潜在的解决方案，然后根据给定值c的划分效果来评估分区的有效性。从聚类倾向测试中创建一个交配池，并允许交叉并产生继承其双亲中任何一方较好的分区的后代解决方案。所提出的方法是对基于多目标遗传算法的聚类技术的改进，该技术先前被证明优于(或至少可比)假设已知值c的鲁棒聚类方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

An improved genetic algorithm for robust fuzzy clustering with unknown number of clusters

In this paper the problem of partitioning noisy data when the number of clusters c is not known a priori is revisited. The methodology proposed is a population-based search in the partition space using a genetic algorithm. Potential solutions are represented using a two-part representation scheme, where the first part of the chromosome represents the classification of the data into true (retained) and outlier (trimmed) sets, and the second part is the result of a partition on the true set for a particular value of c, which is simultaneously optimized by the process. A two-tier fitness function is also proposed in this paper, one which first assesses potential solutions on the basis of a test of clustering tendency on the retained set, and later on the efficacy of the partition for a given value of c. A mating pool is created out of highly successful individuals from the test of clustering tendency and allowed to crossover and produce offspring solutions which inherit the better partition from either of its parents. The proposed methodology is an improvement over a multi-objective genetic algorithm-based clustering technique, which previously was shown to be superior (or at least comparable) to robust clustering methods that assume a known value of c.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2010 Annual Meeting of the North American Fuzzy Information Processing Society

自引率

0.00%

发文量