{"title":"An empirical study of filter-based feature selection algorithms using noisy training data","authors":"Weiwei Yuan, D. Guan, Linshan Shen, Haiwei Pan","doi":"10.1109/ICIST.2014.6920367","DOIUrl":null,"url":null,"abstract":"In this research, we empirically evaluate the performance of filter based feature selection using noisy data containing mislabeled samples. Mislabeled data are present in many real applications, but existing studies have not explored their influence on feature selection. We tested six well-known filter feature selection methods using datasets with pre-defined mislabeled ratios. Our results show that in most cases, feature selection performance degrades with increasing mislabeled ratios. We also evaluate the effects of mislabeled data on small size data feature selection and outline the more serious negative effects of mislabeled data. The results of this study suggest that most feature selection methods are not robust enough for noisy data containing mislabeled samples. Therefore, proper processing of noisy data before feature selection should be considered.","PeriodicalId":306383,"journal":{"name":"2014 4th IEEE International Conference on Information Science and Technology","volume":"24 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 4th IEEE International Conference on Information Science and Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICIST.2014.6920367","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 7
Abstract
In this research, we empirically evaluate the performance of filter based feature selection using noisy data containing mislabeled samples. Mislabeled data are present in many real applications, but existing studies have not explored their influence on feature selection. We tested six well-known filter feature selection methods using datasets with pre-defined mislabeled ratios. Our results show that in most cases, feature selection performance degrades with increasing mislabeled ratios. We also evaluate the effects of mislabeled data on small size data feature selection and outline the more serious negative effects of mislabeled data. The results of this study suggest that most feature selection methods are not robust enough for noisy data containing mislabeled samples. Therefore, proper processing of noisy data before feature selection should be considered.