{"title":"特征选择方法对阿拉伯语文本分类的影响","authors":"R. Elhassan, Mahmoud Ali","doi":"10.1109/CAIS.2019.8769526","DOIUrl":null,"url":null,"abstract":"Text classification becomes the most important research issues and common technique in data mining and its main challenge is the high dimensionality problem where the feature space of the documents is very huge with redundancy and noisy data. Feature selection aims to enhance the accuracy of the classifier by reduce the dimensionality in that space. Due to its richness, Arabic text classification consider as the most language with high dimensionality. This paper aims to study the effectiveness of using feature selection techniques to enhance the Arabic text classifiers performance. Two feature selection techniques: InfoGain and Chi-square statistic (CHI) and two machines supervised machine learning models were investigated. The results showed that the feature selection enhances the performance of the modes. InfoGain feature selection technique outperforms the Chi-Square Statistic (CHI) feature selection technique when implemented the NB classifier and worked equally when implemented SMO classifier.","PeriodicalId":220129,"journal":{"name":"2019 2nd International Conference on Computer Applications & Information Security (ICCAIS)","volume":"19 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"The Impact of Feature Selection Methods for Classifying Arabic Texts\",\"authors\":\"R. Elhassan, Mahmoud Ali\",\"doi\":\"10.1109/CAIS.2019.8769526\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text classification becomes the most important research issues and common technique in data mining and its main challenge is the high dimensionality problem where the feature space of the documents is very huge with redundancy and noisy data. Feature selection aims to enhance the accuracy of the classifier by reduce the dimensionality in that space. Due to its richness, Arabic text classification consider as the most language with high dimensionality. This paper aims to study the effectiveness of using feature selection techniques to enhance the Arabic text classifiers performance. Two feature selection techniques: InfoGain and Chi-square statistic (CHI) and two machines supervised machine learning models were investigated. The results showed that the feature selection enhances the performance of the modes. InfoGain feature selection technique outperforms the Chi-Square Statistic (CHI) feature selection technique when implemented the NB classifier and worked equally when implemented SMO classifier.\",\"PeriodicalId\":220129,\"journal\":{\"name\":\"2019 2nd International Conference on Computer Applications & Information Security (ICCAIS)\",\"volume\":\"19 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 2nd International Conference on Computer Applications & Information Security (ICCAIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CAIS.2019.8769526\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 2nd International Conference on Computer Applications & Information Security (ICCAIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CAIS.2019.8769526","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
The Impact of Feature Selection Methods for Classifying Arabic Texts
Text classification becomes the most important research issues and common technique in data mining and its main challenge is the high dimensionality problem where the feature space of the documents is very huge with redundancy and noisy data. Feature selection aims to enhance the accuracy of the classifier by reduce the dimensionality in that space. Due to its richness, Arabic text classification consider as the most language with high dimensionality. This paper aims to study the effectiveness of using feature selection techniques to enhance the Arabic text classifiers performance. Two feature selection techniques: InfoGain and Chi-square statistic (CHI) and two machines supervised machine learning models were investigated. The results showed that the feature selection enhances the performance of the modes. InfoGain feature selection technique outperforms the Chi-Square Statistic (CHI) feature selection technique when implemented the NB classifier and worked equally when implemented SMO classifier.