提高Solr信息检索系统的能力:阿拉伯语

Aminah Alqahtani, Manal Alnefaie, Nourah Alamri, Ahmad Khorsi
{"title":"提高Solr信息检索系统的能力:阿拉伯语","authors":"Aminah Alqahtani, Manal Alnefaie, Nourah Alamri, Ahmad Khorsi","doi":"10.1109/ICCAIS48893.2020.9096810","DOIUrl":null,"url":null,"abstract":"Arabic language is one of the most complex languages in Natural Language Processing (NLP). Solr is an Information Retrieval System (IRS) that is widely known for its accurate results and high performance in English. However, Arabic stemmer that is currently used by Solr is called Light-10 which has some deficiencies. In this approach, we evaluated two light stemmers (Assem, Tashaphyne) and two root stemmers (Khoja, ISRI) and chose the two stemmers that the experiments show the best; in addition to Light-10 stemmer. The highest two stemmers are Assem and Khoja. So, we used these two stemmers and Light-10 to evaluate the search retrieval accuracy of Solr in Arabic, then evaluated them again with synonyms. The evaluation is based on using two metrics Precision and Normalized Discounted Cumulative Gain (NDCG). Assem stemmer has the highest accuracy which is 86%, Light-10 is 83% and Khoja is 81%. Finally, Assem stemmer has been used as the stemmer for Almufed search engine that we developed in this approach based on Solr for more than 6000 Arabic books from Alshamela Library.","PeriodicalId":422184,"journal":{"name":"2020 3rd International Conference on Computer Applications & Information Security (ICCAIS)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Enhancing the Capabilities of Solr Information Retrieval System: Arabic Language\",\"authors\":\"Aminah Alqahtani, Manal Alnefaie, Nourah Alamri, Ahmad Khorsi\",\"doi\":\"10.1109/ICCAIS48893.2020.9096810\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Arabic language is one of the most complex languages in Natural Language Processing (NLP). Solr is an Information Retrieval System (IRS) that is widely known for its accurate results and high performance in English. However, Arabic stemmer that is currently used by Solr is called Light-10 which has some deficiencies. In this approach, we evaluated two light stemmers (Assem, Tashaphyne) and two root stemmers (Khoja, ISRI) and chose the two stemmers that the experiments show the best; in addition to Light-10 stemmer. The highest two stemmers are Assem and Khoja. So, we used these two stemmers and Light-10 to evaluate the search retrieval accuracy of Solr in Arabic, then evaluated them again with synonyms. The evaluation is based on using two metrics Precision and Normalized Discounted Cumulative Gain (NDCG). Assem stemmer has the highest accuracy which is 86%, Light-10 is 83% and Khoja is 81%. Finally, Assem stemmer has been used as the stemmer for Almufed search engine that we developed in this approach based on Solr for more than 6000 Arabic books from Alshamela Library.\",\"PeriodicalId\":422184,\"journal\":{\"name\":\"2020 3rd International Conference on Computer Applications & Information Security (ICCAIS)\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-03-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 3rd International Conference on Computer Applications & Information Security (ICCAIS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCAIS48893.2020.9096810\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 3rd International Conference on Computer Applications & Information Security (ICCAIS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCAIS48893.2020.9096810","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

摘要

阿拉伯语是自然语言处理(NLP)中最复杂的语言之一。Solr是一个信息检索系统(IRS),以其准确的结果和高性能的英语而闻名。然而,Solr目前使用的阿拉伯语茎是Light-10,它有一些不足。在该方法中,我们对两个轻茎(Assem, Tashaphyne)和两个根茎(Khoja, ISRI)进行了评价,并选择了两个实验表现最好的茎;除了光-10茎。最高的两个茎是Assem和Khoja。因此,我们使用这两个stemmers和Light-10来评估阿拉伯语Solr的搜索检索精度,然后再使用同义词对它们进行评估。评估是基于两个指标精度和归一化贴现累积增益(NDCG)。Assem stemmer的准确率最高,为86%,Light-10为83%,Khoja为81%。最后,Assem的词干被用作Almufed搜索引擎的词干,我们基于Solr开发了这个搜索引擎,搜索了阿拉伯文图书馆的6000多本阿拉伯文图书。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Enhancing the Capabilities of Solr Information Retrieval System: Arabic Language
Arabic language is one of the most complex languages in Natural Language Processing (NLP). Solr is an Information Retrieval System (IRS) that is widely known for its accurate results and high performance in English. However, Arabic stemmer that is currently used by Solr is called Light-10 which has some deficiencies. In this approach, we evaluated two light stemmers (Assem, Tashaphyne) and two root stemmers (Khoja, ISRI) and chose the two stemmers that the experiments show the best; in addition to Light-10 stemmer. The highest two stemmers are Assem and Khoja. So, we used these two stemmers and Light-10 to evaluate the search retrieval accuracy of Solr in Arabic, then evaluated them again with synonyms. The evaluation is based on using two metrics Precision and Normalized Discounted Cumulative Gain (NDCG). Assem stemmer has the highest accuracy which is 86%, Light-10 is 83% and Khoja is 81%. Finally, Assem stemmer has been used as the stemmer for Almufed search engine that we developed in this approach based on Solr for more than 6000 Arabic books from Alshamela Library.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信