{"title":"Accuracy evaluation of Arabic text classification","authors":"M. Sayed, Rashed K. Salem, Ayman E. Khedr","doi":"10.1109/ICCES.2017.8275333","DOIUrl":null,"url":null,"abstract":"Categorization of Arabic text is a significant challenge nowadays owing to the richness of text that occurs through various modules. Also, the Arabic language is considered the fifth spoken one. During the last decade, scholars incubated few concerns about this regard comparing with English language. The objective behind this investigation is to perform and evaluate new mechanism relating to different techniques of machine learning specifically for classifying Arabic text in fresh different data set. Preprocessing steps along with the representation pattern of text are essential for handling text without artifacts. We use a binary term occurrence matrix as mutual information for feature vector representation method. This paper evaluates the outcomes of classification via using Deep learning, K-Nearest Neighbor, Support Vector Machine and Naïve Bayes classifiers in similarity text level and N-gram level. It has been extracted out the outcomes that the Deep learning achieves better performance compared to itself in case of increasing similarity level and N-gram level.","PeriodicalId":170532,"journal":{"name":"2017 12th International Conference on Computer Engineering and Systems (ICCES)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 12th International Conference on Computer Engineering and Systems (ICCES)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCES.2017.8275333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 9
Abstract
Categorization of Arabic text is a significant challenge nowadays owing to the richness of text that occurs through various modules. Also, the Arabic language is considered the fifth spoken one. During the last decade, scholars incubated few concerns about this regard comparing with English language. The objective behind this investigation is to perform and evaluate new mechanism relating to different techniques of machine learning specifically for classifying Arabic text in fresh different data set. Preprocessing steps along with the representation pattern of text are essential for handling text without artifacts. We use a binary term occurrence matrix as mutual information for feature vector representation method. This paper evaluates the outcomes of classification via using Deep learning, K-Nearest Neighbor, Support Vector Machine and Naïve Bayes classifiers in similarity text level and N-gram level. It has been extracted out the outcomes that the Deep learning achieves better performance compared to itself in case of increasing similarity level and N-gram level.