{"title":"RELIEF-C: Efficient Feature Selection for Clustering over Noisy Data","authors":"M. Dash, Y. Ong","doi":"10.1109/ICTAI.2011.135","DOIUrl":null,"url":null,"abstract":"RELIEF is a very effective and extremely popular feature selection algorithm developed for the first time in 1992 by Kira and Rendell. Since then it has been modified and expanded in various ways to make it more efficient. But the original RELIEF and all of its expansions are for feature selection over labeled data for classification purposes. To the best of our knowledge, for the first time ever RELIEF is used in this paper as RELIEF-C for unlabeled data to select relevant features for clustering. We modified RELIEF so as to overcome its inherent difficulties in the presence of large number of irrelevant features and/or significant number of noisy tuples. RELIEF-C has several advantages over existing wrapper and filter feature selection methods: (a) it works well in the presence of large amount of noisy tuples, (b) it is robust even when underlying clustering algorithm fails to cluster properly, and (c) it accurately recognizes the relevant features even in the presence of large number of irrelevant features. We compared RELIEF-C with two established feature selection methods for clustering. RELIEF-C outperforms other methods significantly over synthetic, benchmark and real world data sets particularly when data set consists of large amount of noisy tuples and/or irrelevant features.","PeriodicalId":332661,"journal":{"name":"2011 IEEE 23rd International Conference on Tools with Artificial Intelligence","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"17","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 IEEE 23rd International Conference on Tools with Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2011.135","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 17
Abstract
RELIEF is a very effective and extremely popular feature selection algorithm developed for the first time in 1992 by Kira and Rendell. Since then it has been modified and expanded in various ways to make it more efficient. But the original RELIEF and all of its expansions are for feature selection over labeled data for classification purposes. To the best of our knowledge, for the first time ever RELIEF is used in this paper as RELIEF-C for unlabeled data to select relevant features for clustering. We modified RELIEF so as to overcome its inherent difficulties in the presence of large number of irrelevant features and/or significant number of noisy tuples. RELIEF-C has several advantages over existing wrapper and filter feature selection methods: (a) it works well in the presence of large amount of noisy tuples, (b) it is robust even when underlying clustering algorithm fails to cluster properly, and (c) it accurately recognizes the relevant features even in the presence of large number of irrelevant features. We compared RELIEF-C with two established feature selection methods for clustering. RELIEF-C outperforms other methods significantly over synthetic, benchmark and real world data sets particularly when data set consists of large amount of noisy tuples and/or irrelevant features.