{"title":"利用人工蜂群算法作为阿拉伯语文本分类的特征选择方法","authors":"M. Hijazi, A. Zeki, A. Ismail","doi":"10.34028/iajit/20/3a/11","DOIUrl":null,"url":null,"abstract":"A huge amount of crucial information is contained in documents. The vast increase in the number of E-documents available for user access makes the utilization of automated text classification essential. Classifying or arranging documents into predefined groups is called Text classification. Feature selection (FS) is needed for minimizing the dimensionality of high-dimensional data and extracting only the features that are most pertinent to a particular task. One of the widely used algorithms for feature selection in text classification is the Evolutionary algorithm. In this paper, the filter method chi-square and the Artificial Bee Colony (ABC) algorithm were both used as FS methods. The chi-square method is a useful technique for reducing the number of features and removing those that are superfluous or redundant. The ABC technique considers the chi-square method's chosen features as viable solutions (food sources). The ABC algorithm searches for the most efficient selection of features that increase classification performance. Support Vector Machine and Naïve Bayes classifiers were used as a fitness function for the ABC algorithm. The experiment results demonstrated that the proposed feature selection method was able of decreasing the number of features by approximately 89.5%, and 94%, respectively when NB and SVM were used as fitness functions in comparison to the original dataset, while also enhancing classification performance","PeriodicalId":13624,"journal":{"name":"Int. Arab J. Inf. Technol.","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2023-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Utilizing artificial bee colony algorithm as feature selection method in arabic text classification\",\"authors\":\"M. Hijazi, A. Zeki, A. Ismail\",\"doi\":\"10.34028/iajit/20/3a/11\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A huge amount of crucial information is contained in documents. The vast increase in the number of E-documents available for user access makes the utilization of automated text classification essential. Classifying or arranging documents into predefined groups is called Text classification. Feature selection (FS) is needed for minimizing the dimensionality of high-dimensional data and extracting only the features that are most pertinent to a particular task. One of the widely used algorithms for feature selection in text classification is the Evolutionary algorithm. In this paper, the filter method chi-square and the Artificial Bee Colony (ABC) algorithm were both used as FS methods. The chi-square method is a useful technique for reducing the number of features and removing those that are superfluous or redundant. The ABC technique considers the chi-square method's chosen features as viable solutions (food sources). The ABC algorithm searches for the most efficient selection of features that increase classification performance. Support Vector Machine and Naïve Bayes classifiers were used as a fitness function for the ABC algorithm. The experiment results demonstrated that the proposed feature selection method was able of decreasing the number of features by approximately 89.5%, and 94%, respectively when NB and SVM were used as fitness functions in comparison to the original dataset, while also enhancing classification performance\",\"PeriodicalId\":13624,\"journal\":{\"name\":\"Int. Arab J. Inf. Technol.\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. Arab J. Inf. Technol.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.34028/iajit/20/3a/11\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. Arab J. Inf. Technol.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.34028/iajit/20/3a/11","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Utilizing artificial bee colony algorithm as feature selection method in arabic text classification
A huge amount of crucial information is contained in documents. The vast increase in the number of E-documents available for user access makes the utilization of automated text classification essential. Classifying or arranging documents into predefined groups is called Text classification. Feature selection (FS) is needed for minimizing the dimensionality of high-dimensional data and extracting only the features that are most pertinent to a particular task. One of the widely used algorithms for feature selection in text classification is the Evolutionary algorithm. In this paper, the filter method chi-square and the Artificial Bee Colony (ABC) algorithm were both used as FS methods. The chi-square method is a useful technique for reducing the number of features and removing those that are superfluous or redundant. The ABC technique considers the chi-square method's chosen features as viable solutions (food sources). The ABC algorithm searches for the most efficient selection of features that increase classification performance. Support Vector Machine and Naïve Bayes classifiers were used as a fitness function for the ABC algorithm. The experiment results demonstrated that the proposed feature selection method was able of decreasing the number of features by approximately 89.5%, and 94%, respectively when NB and SVM were used as fitness functions in comparison to the original dataset, while also enhancing classification performance