Feature Selection with High-Dimensional Imbalanced Data

2009 IEEE International Conference on Data Mining Workshops Pub Date : 2009-12-06 DOI:10.1109/ICDMW.2009.35

J. V. Hulse, T. Khoshgoftaar, Amri Napolitano, Randall Wald

引用次数: 151

Abstract

Feature selection is an important topic in data mining, especially for high dimensional datasets. Filtering techniques in particular have received much attention, but detailed comparisons of their performance is lacking. This work considers three filters using classifier performance metrics and six commonly-used filters. All nine filtering techniques are compared and contrasted using five different microarray expression datasets. In addition, given that these datasets exhibit an imbalance between the number of positive and negative examples, the utilization of sampling techniques in the context of feature selection is examined.

查看原文本刊更多论文

高维不平衡数据的特征选择

特征选择是数据挖掘中的一个重要课题，特别是对于高维数据集。过滤技术尤其受到关注，但缺乏对其性能的详细比较。这项工作考虑了使用分类器性能指标的三种过滤器和六种常用过滤器。使用五种不同的微阵列表达数据集对所有九种过滤技术进行了比较和对比。此外，考虑到这些数据集在正例和负例的数量之间表现出不平衡，本文还研究了在特征选择的背景下采样技术的使用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 IEEE International Conference on Data Mining Workshops

自引率

0.00%

发文量