Decision tree rule-based feature selection for large-scale imbalanced data

2017 26th Wireless and Optical Communication Conference (WOCC) Pub Date : 2017-04-07 DOI:10.1109/WOCC.2017.7928973

Haoyue Liu, Mengchu Zhou

引用次数: 6

Abstract

A class imbalance problem often appears in many real world applications, e.g. fault diagnosis, text categorization, fraud detection. When dealing with a large-scale imbalanced dataset, feature selection becomes a great challenge. To confront it, this work proposes a feature selection approach based on a decision tree rule. The effectiveness of the proposed approach is verified by classifying a large-scale dataset from Santander Bank. The results show that our approach can achieve higher Area Under the Curve (AUC) and less computational time. We also compare it with filter-based feature selection approaches, i.e., Chi-Square and F-statistic. The results show that it outperforms them but needs slightly more computational efforts.

查看原文本刊更多论文

基于决策树规则的大规模不平衡数据特征选择

类不平衡问题经常出现在许多实际应用中，如故障诊断、文本分类、欺诈检测等。在处理大规模不平衡数据集时，特征选择成为一个巨大的挑战。为了解决这一问题，本文提出了一种基于决策树规则的特征选择方法。通过对桑坦德银行的大规模数据集进行分类，验证了该方法的有效性。结果表明，该方法可以获得更高的曲线下面积(AUC)和更少的计算时间。我们还将其与基于滤波器的特征选择方法，即卡方和f统计量进行了比较。结果表明，它优于它们，但需要更多的计算努力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2017 26th Wireless and Optical Communication Conference (WOCC)

自引率

0.00%

发文量