New model of feature selection based chaotic firefly algorithm for arabic text categorization

Int. Arab J. Inf. Technol. Pub Date : 2023-01-01 DOI:10.34028/iajit/20/3a/3

M. Hadni, Hjiaj Hassane

{"title":"New model of feature selection based chaotic firefly algorithm for arabic text categorization","authors":"M. Hadni, Hjiaj Hassane","doi":"10.34028/iajit/20/3a/3","DOIUrl":null,"url":null,"abstract":"The dimensionality reduction is a type of problem that appear in the most classification processes. It contains a large number of features; these features may contain unreliable data which may lead the categorization process to unwanted results. Feature selection can be used for reducing dimensionality of datasets and find interesting relevant information. In Arabic language, the number of works applies a meta-heuristic algorithm for feature selection is still limited due to the complex nature of Arabic inflectional and derivational rules as well as its intricate grammatical rules and its rich morphology. This paper proposes a new model for Arabic Feature Selection that combines the chaotic method in the Firefly Algorithm (CFA). The Chaotic Algorithm replaces the attractiveness coefficient in firefly algorithm by the outputs of chaotic application. The enhancement of the new approach involves introducing a novel search strategy which is able to obtain a good ratio between exploitation and exploration abilities of the algorithm. In terms In terms of performance, the experiments of the proposed method are tested using classifiers, namely Naive Bayes (NB), Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) and three evaluation measures, including precision, recall, and F-measure. The experimental findings show that the combining of CFA and SVM classifiers outperforms other combinations in terms of precision.","PeriodicalId":13624,"journal":{"name":"Int. Arab J. Inf. Technol.","volume":"10 1","pages":"461-468"},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. Arab J. Inf. Technol.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34028/iajit/20/3a/3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

The dimensionality reduction is a type of problem that appear in the most classification processes. It contains a large number of features; these features may contain unreliable data which may lead the categorization process to unwanted results. Feature selection can be used for reducing dimensionality of datasets and find interesting relevant information. In Arabic language, the number of works applies a meta-heuristic algorithm for feature selection is still limited due to the complex nature of Arabic inflectional and derivational rules as well as its intricate grammatical rules and its rich morphology. This paper proposes a new model for Arabic Feature Selection that combines the chaotic method in the Firefly Algorithm (CFA). The Chaotic Algorithm replaces the attractiveness coefficient in firefly algorithm by the outputs of chaotic application. The enhancement of the new approach involves introducing a novel search strategy which is able to obtain a good ratio between exploitation and exploration abilities of the algorithm. In terms In terms of performance, the experiments of the proposed method are tested using classifiers, namely Naive Bayes (NB), Support Vector Machine (SVM) and K-Nearest Neighbors (KNN) and three evaluation measures, including precision, recall, and F-measure. The experimental findings show that the combining of CFA and SVM classifiers outperforms other combinations in terms of precision.

查看原文本刊更多论文

基于特征选择的混沌萤火虫算法在阿拉伯文本分类中的新模型

降维问题是大多数分类过程中都会遇到的一类问题。它包含了大量的特征;这些特征可能包含不可靠的数据，这可能导致分类过程产生不想要的结果。特征选择可以用来降低数据集的维数，找到有趣的相关信息。在阿拉伯语中，由于阿拉伯语屈折和衍生规则的复杂性以及其复杂的语法规则和丰富的形态学，应用元启发式算法进行特征选择的作品数量仍然有限。结合萤火虫算法(Firefly Algorithm, CFA)中的混沌方法，提出了一种新的阿拉伯语特征选择模型。混沌算法用混沌应用的输出来代替萤火虫算法中的吸引系数。新方法的改进包括引入一种新的搜索策略，使算法的开发能力和探索能力达到良好的比例。在性能方面，采用朴素贝叶斯(NB)、支持向量机(SVM)和k近邻(KNN)分类器以及精度、召回率和F-measure三个评价指标对所提方法进行了实验测试。实验结果表明，CFA和SVM组合分类器在精度上优于其他组合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Int. Arab J. Inf. Technol.

自引率

0.00%

发文量