{"title":"A New Metaheuristic Approach Based Feature Selection for Arabic Text Categorization","authors":"M. Hadni, Hjiaj Hassane","doi":"10.1109/ACIT57182.2022.9994102","DOIUrl":null,"url":null,"abstract":"With the increase in the number of electronic documents stored on various electronic media and the Web, mainly textual data, the development of tools for analysis and automatic processing of texts, particularly the automatic text categorization, has become essential. Most of the work done in this area has been devoted primarily to Western languages, especially English. Arabic, a morphologically rich and strongly inflected language, has little study. The number of features is a significant challenge in classifying Arabic documents, introducing difficulties at several levels, such as complexity and computation time. This paper proposes a new metaheuristic approach to dimensionality reduction, aiming to find a representation of the initial data in a smaller space. The model is validated using classifiers, namely NB, SVM and KNN and three evaluation measures, including precision, recall, and F -measure. The proposed method achieves a precision value equal to 98%.","PeriodicalId":256713,"journal":{"name":"2022 International Arab Conference on Information Technology (ACIT)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 International Arab Conference on Information Technology (ACIT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ACIT57182.2022.9994102","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the increase in the number of electronic documents stored on various electronic media and the Web, mainly textual data, the development of tools for analysis and automatic processing of texts, particularly the automatic text categorization, has become essential. Most of the work done in this area has been devoted primarily to Western languages, especially English. Arabic, a morphologically rich and strongly inflected language, has little study. The number of features is a significant challenge in classifying Arabic documents, introducing difficulties at several levels, such as complexity and computation time. This paper proposes a new metaheuristic approach to dimensionality reduction, aiming to find a representation of the initial data in a smaller space. The model is validated using classifiers, namely NB, SVM and KNN and three evaluation measures, including precision, recall, and F -measure. The proposed method achieves a precision value equal to 98%.