Feature selection for text classification using genetic algorithms

2016 8th International Conference on Modelling, Identification and Control (ICMIC) Pub Date : 2016-11-01 DOI:10.1109/ICMIC.2016.7804223

N. Bidi, Z. Elberrichi

引用次数: 44

Abstract

In text classification, feature selection is essential to improve the classification effectiveness. This paper provides an empirical study of a feature selection method based on genetic algorithms for different text representation methods. This feature selection algorithm can accomplish two goals: in one hand is the search of a feature subset such that the performance of classifier is best; in other hands is find a feature subset with the smallest dimensionality which achieves higher accuracy in classification. To evaluate the performance of this approach, three from the best classifiers have been selected: Naive Bayes (NB), Nearest Neighbors (KNN) and Support Vector Machines (SVMs). Our objective is to determine whether the genetic algorithms based feature selection will improve the performances in text classification with smaller size using F-measure. Experimentations were carried out on two benchmark document collections 20Newsgroups, and Reuters-21578. And the results were very interesting.

查看原文本刊更多论文

基于遗传算法的文本分类特征选择

在文本分类中，特征选择是提高分类效率的关键。本文针对不同的文本表示方法，对基于遗传算法的特征选择方法进行了实证研究。这种特征选择算法可以实现两个目标:一方面是搜索一个特征子集，使分类器的性能最好;另一方面是寻找具有最小维数的特征子集，以达到更高的分类精度。为了评估这种方法的性能，我们从最佳分类器中选择了三种:朴素贝叶斯(NB)、最近邻(KNN)和支持向量机(svm)。我们的目标是确定基于遗传算法的特征选择是否会提高使用F-measure进行小尺寸文本分类的性能。在两个基准文档集合20Newsgroups和Reuters-21578上进行了实验。结果非常有趣。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2016 8th International Conference on Modelling, Identification and Control (ICMIC)

自引率

0.00%

发文量