{"title":"阿拉伯语光基词干:光茎、p茎和条件光茎的比较研究","authors":"Sabria Mohammed Hussien, Hazim J. Aburagheef","doi":"10.1109/IT-ELA52201.2021.9773743","DOIUrl":null,"url":null,"abstract":"Arabic stemming is a key stage in natural language processing's preprocessing (NLP). It takes affixes out of words. It improves text classification (TC) as well as information retrieval (IR). Light-based stemming and root-based stemming are the two types of stem. When compared to root-based stemming, light-based stemming consumes more energy. Only suffixes and prefixes are removed from the words. The light10 stemmer, the p-stemmer, and conditional light stemming (CondLight) are three well-known methods of light stemming. Prefixes and suffixes are removed by Light10 stemmers under a few conditions. Only prefixes are removed by the P-stemmer, while the CondLight stemmer is the same as the Light10 stemmer but with eight conditions. We measured the extent of improvement in Arabic TC by evaluating the stemmers. Three classifiers employ the Support Vector Machine (SVM), the k-nearest neighbor algorithm (KNN), Nave Bays (NB), and statistical similarity measurement. With stemming, the outcome indicates a small improvement (about 2 percent improvement).","PeriodicalId":330552,"journal":{"name":"2021 2nd Information Technology To Enhance e-learning and Other Application (IT-ELA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Arabic light-based stemming: a comparative study among ligh10 stemmer, P-stemmer, and Conditional light stemmer\",\"authors\":\"Sabria Mohammed Hussien, Hazim J. Aburagheef\",\"doi\":\"10.1109/IT-ELA52201.2021.9773743\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Arabic stemming is a key stage in natural language processing's preprocessing (NLP). It takes affixes out of words. It improves text classification (TC) as well as information retrieval (IR). Light-based stemming and root-based stemming are the two types of stem. When compared to root-based stemming, light-based stemming consumes more energy. Only suffixes and prefixes are removed from the words. The light10 stemmer, the p-stemmer, and conditional light stemming (CondLight) are three well-known methods of light stemming. Prefixes and suffixes are removed by Light10 stemmers under a few conditions. Only prefixes are removed by the P-stemmer, while the CondLight stemmer is the same as the Light10 stemmer but with eight conditions. We measured the extent of improvement in Arabic TC by evaluating the stemmers. Three classifiers employ the Support Vector Machine (SVM), the k-nearest neighbor algorithm (KNN), Nave Bays (NB), and statistical similarity measurement. With stemming, the outcome indicates a small improvement (about 2 percent improvement).\",\"PeriodicalId\":330552,\"journal\":{\"name\":\"2021 2nd Information Technology To Enhance e-learning and Other Application (IT-ELA)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 2nd Information Technology To Enhance e-learning and Other Application (IT-ELA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IT-ELA52201.2021.9773743\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 2nd Information Technology To Enhance e-learning and Other Application (IT-ELA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IT-ELA52201.2021.9773743","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Arabic light-based stemming: a comparative study among ligh10 stemmer, P-stemmer, and Conditional light stemmer
Arabic stemming is a key stage in natural language processing's preprocessing (NLP). It takes affixes out of words. It improves text classification (TC) as well as information retrieval (IR). Light-based stemming and root-based stemming are the two types of stem. When compared to root-based stemming, light-based stemming consumes more energy. Only suffixes and prefixes are removed from the words. The light10 stemmer, the p-stemmer, and conditional light stemming (CondLight) are three well-known methods of light stemming. Prefixes and suffixes are removed by Light10 stemmers under a few conditions. Only prefixes are removed by the P-stemmer, while the CondLight stemmer is the same as the Light10 stemmer but with eight conditions. We measured the extent of improvement in Arabic TC by evaluating the stemmers. Three classifiers employ the Support Vector Machine (SVM), the k-nearest neighbor algorithm (KNN), Nave Bays (NB), and statistical similarity measurement. With stemming, the outcome indicates a small improvement (about 2 percent improvement).