{"title":"词干提取作为阿拉伯语文本分类的特征约简技术","authors":"F. Harrag, Eyas El-Qawasmah, A. Al-Salman","doi":"10.1109/ISPS.2011.5898874","DOIUrl":null,"url":null,"abstract":"In this paper, a comparative study is conducted for three text preprocessing techniques in the context of the Arabic text categorization problem using an in-house Arabic dataset. We evaluated and compared three Stemming techniques. They are: Light-Stemming, Root-Based-Stemming and Dictionary-Lookup-Stemming. The purpose is to reduce the feature space into an input space of much lower dimension for two different state-of-the art classifiers: Artificial Neural Networks and support vector machines. The results illustrated that using light stemmer enhances the performance of Arabic Text Categorization. The results also showed that the proposed Artificial Neural Networks model was able to achieve high categorization effectiveness as measured by Macro-Average F1 measure.","PeriodicalId":305060,"journal":{"name":"2011 10th International Symposium on Programming and Systems","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"56","resultStr":"{\"title\":\"Stemming as a feature reduction technique for Arabic Text Categorization\",\"authors\":\"F. Harrag, Eyas El-Qawasmah, A. Al-Salman\",\"doi\":\"10.1109/ISPS.2011.5898874\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, a comparative study is conducted for three text preprocessing techniques in the context of the Arabic text categorization problem using an in-house Arabic dataset. We evaluated and compared three Stemming techniques. They are: Light-Stemming, Root-Based-Stemming and Dictionary-Lookup-Stemming. The purpose is to reduce the feature space into an input space of much lower dimension for two different state-of-the art classifiers: Artificial Neural Networks and support vector machines. The results illustrated that using light stemmer enhances the performance of Arabic Text Categorization. The results also showed that the proposed Artificial Neural Networks model was able to achieve high categorization effectiveness as measured by Macro-Average F1 measure.\",\"PeriodicalId\":305060,\"journal\":{\"name\":\"2011 10th International Symposium on Programming and Systems\",\"volume\":\"26 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-04-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"56\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 10th International Symposium on Programming and Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISPS.2011.5898874\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 10th International Symposium on Programming and Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISPS.2011.5898874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Stemming as a feature reduction technique for Arabic Text Categorization
In this paper, a comparative study is conducted for three text preprocessing techniques in the context of the Arabic text categorization problem using an in-house Arabic dataset. We evaluated and compared three Stemming techniques. They are: Light-Stemming, Root-Based-Stemming and Dictionary-Lookup-Stemming. The purpose is to reduce the feature space into an input space of much lower dimension for two different state-of-the art classifiers: Artificial Neural Networks and support vector machines. The results illustrated that using light stemmer enhances the performance of Arabic Text Categorization. The results also showed that the proposed Artificial Neural Networks model was able to achieve high categorization effectiveness as measured by Macro-Average F1 measure.