{"title":"基于特征选择技术和四种分类器模型的阿拉伯语文本分类的比较研究","authors":"Said Bahassine, Abdellah Madani, M. Kissi","doi":"10.1145/3419604.3419778","DOIUrl":null,"url":null,"abstract":"Text classification is the process of assigning appropriate categories to free text according to its content. It is one of the important task in Text mining. Numerous studies have been conducted for natural languages processing using Japanese, French, Latin and Turkish documents, but the number of works related to the text written in Arabic language is still limited. In this paper we conduct a comparative study of three methods of feature selection using four well-known classifiers namely: Decision Tree, Naive Bayes, K-Nearest Neighbors and Support Vector Machine. A corpus contained 250 Arabic text belonging into five classes: sport, politics, economics, culture and art, and society. The data set is used to evaluate and compare the effectiveness of the obtained model. The experimental results reveal that using improved Chi-square method as feature selection and Support Vector Machine as classifier outperforms other combinations in terms of precision. This combination significantly improves the performance of Arabic text classification model. The highest value of precision measure for this model is 89.9%.","PeriodicalId":250715,"journal":{"name":"Proceedings of the 13th International Conference on Intelligent Systems: Theories and Applications","volume":"37 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Comparative Study of Arabic Text Categorization Using Feature Selection Techniques and Four Classifier Models\",\"authors\":\"Said Bahassine, Abdellah Madani, M. Kissi\",\"doi\":\"10.1145/3419604.3419778\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text classification is the process of assigning appropriate categories to free text according to its content. It is one of the important task in Text mining. Numerous studies have been conducted for natural languages processing using Japanese, French, Latin and Turkish documents, but the number of works related to the text written in Arabic language is still limited. In this paper we conduct a comparative study of three methods of feature selection using four well-known classifiers namely: Decision Tree, Naive Bayes, K-Nearest Neighbors and Support Vector Machine. A corpus contained 250 Arabic text belonging into five classes: sport, politics, economics, culture and art, and society. The data set is used to evaluate and compare the effectiveness of the obtained model. The experimental results reveal that using improved Chi-square method as feature selection and Support Vector Machine as classifier outperforms other combinations in terms of precision. This combination significantly improves the performance of Arabic text classification model. The highest value of precision measure for this model is 89.9%.\",\"PeriodicalId\":250715,\"journal\":{\"name\":\"Proceedings of the 13th International Conference on Intelligent Systems: Theories and Applications\",\"volume\":\"37 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-09-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 13th International Conference on Intelligent Systems: Theories and Applications\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3419604.3419778\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 13th International Conference on Intelligent Systems: Theories and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3419604.3419778","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Comparative Study of Arabic Text Categorization Using Feature Selection Techniques and Four Classifier Models
Text classification is the process of assigning appropriate categories to free text according to its content. It is one of the important task in Text mining. Numerous studies have been conducted for natural languages processing using Japanese, French, Latin and Turkish documents, but the number of works related to the text written in Arabic language is still limited. In this paper we conduct a comparative study of three methods of feature selection using four well-known classifiers namely: Decision Tree, Naive Bayes, K-Nearest Neighbors and Support Vector Machine. A corpus contained 250 Arabic text belonging into five classes: sport, politics, economics, culture and art, and society. The data set is used to evaluate and compare the effectiveness of the obtained model. The experimental results reveal that using improved Chi-square method as feature selection and Support Vector Machine as classifier outperforms other combinations in terms of precision. This combination significantly improves the performance of Arabic text classification model. The highest value of precision measure for this model is 89.9%.