阿拉伯语光基词干:光茎、p茎和条件光茎的比较研究

Sabria Mohammed Hussien, Hazim J. Aburagheef
{"title":"阿拉伯语光基词干:光茎、p茎和条件光茎的比较研究","authors":"Sabria Mohammed Hussien, Hazim J. Aburagheef","doi":"10.1109/IT-ELA52201.2021.9773743","DOIUrl":null,"url":null,"abstract":"Arabic stemming is a key stage in natural language processing's preprocessing (NLP). It takes affixes out of words. It improves text classification (TC) as well as information retrieval (IR). Light-based stemming and root-based stemming are the two types of stem. When compared to root-based stemming, light-based stemming consumes more energy. Only suffixes and prefixes are removed from the words. The light10 stemmer, the p-stemmer, and conditional light stemming (CondLight) are three well-known methods of light stemming. Prefixes and suffixes are removed by Light10 stemmers under a few conditions. Only prefixes are removed by the P-stemmer, while the CondLight stemmer is the same as the Light10 stemmer but with eight conditions. We measured the extent of improvement in Arabic TC by evaluating the stemmers. Three classifiers employ the Support Vector Machine (SVM), the k-nearest neighbor algorithm (KNN), Nave Bays (NB), and statistical similarity measurement. With stemming, the outcome indicates a small improvement (about 2 percent improvement).","PeriodicalId":330552,"journal":{"name":"2021 2nd Information Technology To Enhance e-learning and Other Application (IT-ELA)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Arabic light-based stemming: a comparative study among ligh10 stemmer, P-stemmer, and Conditional light stemmer\",\"authors\":\"Sabria Mohammed Hussien, Hazim J. Aburagheef\",\"doi\":\"10.1109/IT-ELA52201.2021.9773743\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Arabic stemming is a key stage in natural language processing's preprocessing (NLP). It takes affixes out of words. It improves text classification (TC) as well as information retrieval (IR). Light-based stemming and root-based stemming are the two types of stem. When compared to root-based stemming, light-based stemming consumes more energy. Only suffixes and prefixes are removed from the words. The light10 stemmer, the p-stemmer, and conditional light stemming (CondLight) are three well-known methods of light stemming. Prefixes and suffixes are removed by Light10 stemmers under a few conditions. Only prefixes are removed by the P-stemmer, while the CondLight stemmer is the same as the Light10 stemmer but with eight conditions. We measured the extent of improvement in Arabic TC by evaluating the stemmers. Three classifiers employ the Support Vector Machine (SVM), the k-nearest neighbor algorithm (KNN), Nave Bays (NB), and statistical similarity measurement. With stemming, the outcome indicates a small improvement (about 2 percent improvement).\",\"PeriodicalId\":330552,\"journal\":{\"name\":\"2021 2nd Information Technology To Enhance e-learning and Other Application (IT-ELA)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 2nd Information Technology To Enhance e-learning and Other Application (IT-ELA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/IT-ELA52201.2021.9773743\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 2nd Information Technology To Enhance e-learning and Other Application (IT-ELA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IT-ELA52201.2021.9773743","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

阿拉伯语词干提取是自然语言处理预处理(NLP)的关键阶段。它把词缀去掉了。它改进了文本分类(TC)和信息检索(IR)。基于光的茎干和基于根的茎干是茎的两种类型。与基于根的提取相比,基于光的提取消耗更多的能量。只有后缀和前缀从单词中删除。light10茎干提取、p茎干提取和条件光提取(CondLight)是三种众所周知的光提取方法。在一些条件下,Light10词干会删除前缀和后缀。p -茎只去除前缀,而CondLight茎与Light10茎相同,但有八个条件。我们通过评估干细胞来衡量阿拉伯语TC的改善程度。三种分类器采用支持向量机(SVM)、k近邻算法(KNN)、中湾(NB)和统计相似性度量。使用词干提取,结果表明有一个小的改善(大约2%的改善)。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Arabic light-based stemming: a comparative study among ligh10 stemmer, P-stemmer, and Conditional light stemmer
Arabic stemming is a key stage in natural language processing's preprocessing (NLP). It takes affixes out of words. It improves text classification (TC) as well as information retrieval (IR). Light-based stemming and root-based stemming are the two types of stem. When compared to root-based stemming, light-based stemming consumes more energy. Only suffixes and prefixes are removed from the words. The light10 stemmer, the p-stemmer, and conditional light stemming (CondLight) are three well-known methods of light stemming. Prefixes and suffixes are removed by Light10 stemmers under a few conditions. Only prefixes are removed by the P-stemmer, while the CondLight stemmer is the same as the Light10 stemmer but with eight conditions. We measured the extent of improvement in Arabic TC by evaluating the stemmers. Three classifiers employ the Support Vector Machine (SVM), the k-nearest neighbor algorithm (KNN), Nave Bays (NB), and statistical similarity measurement. With stemming, the outcome indicates a small improvement (about 2 percent improvement).
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信