An empirical study of filter-based feature selection algorithms using noisy training data

2014 4th IEEE International Conference on Information Science and Technology Pub Date : 2014-10-13 DOI:10.1109/ICIST.2014.6920367

Weiwei Yuan, D. Guan, Linshan Shen, Haiwei Pan

引用次数: 7

Abstract

In this research, we empirically evaluate the performance of filter based feature selection using noisy data containing mislabeled samples. Mislabeled data are present in many real applications, but existing studies have not explored their influence on feature selection. We tested six well-known filter feature selection methods using datasets with pre-defined mislabeled ratios. Our results show that in most cases, feature selection performance degrades with increasing mislabeled ratios. We also evaluate the effects of mislabeled data on small size data feature selection and outline the more serious negative effects of mislabeled data. The results of this study suggest that most feature selection methods are not robust enough for noisy data containing mislabeled samples. Therefore, proper processing of noisy data before feature selection should be considered.

查看原文本刊更多论文

基于噪声训练数据的滤波特征选择算法的实证研究

在本研究中，我们使用包含错误标记样本的噪声数据来实证评估基于滤波器的特征选择的性能。在许多实际应用中都存在误标注数据，但现有研究尚未探讨其对特征选择的影响。我们使用预定义错标比率的数据集测试了六种众所周知的过滤器特征选择方法。我们的结果表明，在大多数情况下，特征选择性能随着错误标记比率的增加而下降。我们还评估了错误标记数据对小尺寸数据特征选择的影响，并概述了错误标记数据的更严重的负面影响。本研究的结果表明，大多数特征选择方法对于包含错误标记样本的噪声数据不够鲁棒。因此，在特征选择之前，应该考虑对噪声数据进行适当的处理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 4th IEEE International Conference on Information Science and Technology

自引率

0.00%

发文量