未知簇数的鲁棒模糊聚类改进遗传算法

A. Banerjee
{"title":"未知簇数的鲁棒模糊聚类改进遗传算法","authors":"A. Banerjee","doi":"10.1109/NAFIPS.2010.5548175","DOIUrl":null,"url":null,"abstract":"In this paper the problem of partitioning noisy data when the number of clusters c is not known a priori is revisited. The methodology proposed is a population-based search in the partition space using a genetic algorithm. Potential solutions are represented using a two-part representation scheme, where the first part of the chromosome represents the classification of the data into true (retained) and outlier (trimmed) sets, and the second part is the result of a partition on the true set for a particular value of c, which is simultaneously optimized by the process. A two-tier fitness function is also proposed in this paper, one which first assesses potential solutions on the basis of a test of clustering tendency on the retained set, and later on the efficacy of the partition for a given value of c. A mating pool is created out of highly successful individuals from the test of clustering tendency and allowed to crossover and produce offspring solutions which inherit the better partition from either of its parents. The proposed methodology is an improvement over a multi-objective genetic algorithm-based clustering technique, which previously was shown to be superior (or at least comparable) to robust clustering methods that assume a known value of c.","PeriodicalId":394892,"journal":{"name":"2010 Annual Meeting of the North American Fuzzy Information Processing Society","volume":"4 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"8","resultStr":"{\"title\":\"An improved genetic algorithm for robust fuzzy clustering with unknown number of clusters\",\"authors\":\"A. Banerjee\",\"doi\":\"10.1109/NAFIPS.2010.5548175\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper the problem of partitioning noisy data when the number of clusters c is not known a priori is revisited. The methodology proposed is a population-based search in the partition space using a genetic algorithm. Potential solutions are represented using a two-part representation scheme, where the first part of the chromosome represents the classification of the data into true (retained) and outlier (trimmed) sets, and the second part is the result of a partition on the true set for a particular value of c, which is simultaneously optimized by the process. A two-tier fitness function is also proposed in this paper, one which first assesses potential solutions on the basis of a test of clustering tendency on the retained set, and later on the efficacy of the partition for a given value of c. A mating pool is created out of highly successful individuals from the test of clustering tendency and allowed to crossover and produce offspring solutions which inherit the better partition from either of its parents. The proposed methodology is an improvement over a multi-objective genetic algorithm-based clustering technique, which previously was shown to be superior (or at least comparable) to robust clustering methods that assume a known value of c.\",\"PeriodicalId\":394892,\"journal\":{\"name\":\"2010 Annual Meeting of the North American Fuzzy Information Processing Society\",\"volume\":\"4 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-07-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"8\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2010 Annual Meeting of the North American Fuzzy Information Processing Society\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/NAFIPS.2010.5548175\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2010 Annual Meeting of the North American Fuzzy Information Processing Society","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NAFIPS.2010.5548175","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 8

摘要

本文重新研究了聚类数量c先验未知时的噪声数据划分问题。提出的方法是使用遗传算法在分区空间中进行基于种群的搜索。使用两部分表示方案来表示潜在的解决方案,其中染色体的第一部分表示将数据分类为真集(保留)和离群集(修剪),第二部分是对特定c值的真集进行划分的结果,该过程同时对其进行优化。本文还提出了一个两层适应度函数,该函数首先根据保留集上的聚类倾向测试来评估潜在的解决方案,然后根据给定值c的划分效果来评估分区的有效性。从聚类倾向测试中创建一个交配池,并允许交叉并产生继承其双亲中任何一方较好的分区的后代解决方案。所提出的方法是对基于多目标遗传算法的聚类技术的改进,该技术先前被证明优于(或至少可比)假设已知值c的鲁棒聚类方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
An improved genetic algorithm for robust fuzzy clustering with unknown number of clusters
In this paper the problem of partitioning noisy data when the number of clusters c is not known a priori is revisited. The methodology proposed is a population-based search in the partition space using a genetic algorithm. Potential solutions are represented using a two-part representation scheme, where the first part of the chromosome represents the classification of the data into true (retained) and outlier (trimmed) sets, and the second part is the result of a partition on the true set for a particular value of c, which is simultaneously optimized by the process. A two-tier fitness function is also proposed in this paper, one which first assesses potential solutions on the basis of a test of clustering tendency on the retained set, and later on the efficacy of the partition for a given value of c. A mating pool is created out of highly successful individuals from the test of clustering tendency and allowed to crossover and produce offspring solutions which inherit the better partition from either of its parents. The proposed methodology is an improvement over a multi-objective genetic algorithm-based clustering technique, which previously was shown to be superior (or at least comparable) to robust clustering methods that assume a known value of c.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信