BERT Representation for Arabic Information Retrieval: A Comparative Study

Moulay Abdellah, Kassimi, Abdessalam Essayad
{"title":"BERT Representation for Arabic Information Retrieval: A Comparative Study","authors":"Moulay Abdellah, Kassimi, Abdessalam Essayad","doi":"10.46253/j.mr.v6i3.a1","DOIUrl":null,"url":null,"abstract":": Information is rapidly growing in online documents and social media in all languages. Retrieval of information from a language is a high-level task. However, Information Retrieval has become more important in research and commercial development. Presently only a few tools were available in the market for retrieval. Each language has its unique way of pronunciation and language structure. Arabic has a complex morphology. This made it difficult in the advancement of this field. A typical IR model is required to understand similar words in the matching process. In this paper, we presented a comparative study on recent approaches in Arabic Information Retrieval. We implemented and compared all existing approaches for Arabic IR with Arabic datasets. The information retrieval used an Arabic dataset. We also introduced a dictionary, an Arabic Lemmatizer.It contains Arabic words collected from several Arabic books and websites. We compare the performance of different lemmatization techniques. Then we conduct a series of experiments to compare different approaches to Arabic IR. Furthermore, Arabic BERT examined the superior performance with the existing approach's performance. The experimental result showed BM25 and multilingual BERT ranked most for tasks. The Large Arabic Dataset scored an accuracy of 89% in information retrieval.","PeriodicalId":167187,"journal":{"name":"Multimedia Research","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46253/j.mr.v6i3.a1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

: Information is rapidly growing in online documents and social media in all languages. Retrieval of information from a language is a high-level task. However, Information Retrieval has become more important in research and commercial development. Presently only a few tools were available in the market for retrieval. Each language has its unique way of pronunciation and language structure. Arabic has a complex morphology. This made it difficult in the advancement of this field. A typical IR model is required to understand similar words in the matching process. In this paper, we presented a comparative study on recent approaches in Arabic Information Retrieval. We implemented and compared all existing approaches for Arabic IR with Arabic datasets. The information retrieval used an Arabic dataset. We also introduced a dictionary, an Arabic Lemmatizer.It contains Arabic words collected from several Arabic books and websites. We compare the performance of different lemmatization techniques. Then we conduct a series of experiments to compare different approaches to Arabic IR. Furthermore, Arabic BERT examined the superior performance with the existing approach's performance. The experimental result showed BM25 and multilingual BERT ranked most for tasks. The Large Arabic Dataset scored an accuracy of 89% in information retrieval.
阿拉伯语信息检索的BERT表示:比较研究
所有语言的在线文档和社交媒体中的信息都在迅速增长。从一种语言中检索信息是一项高级任务。然而,信息检索在研究和商业发展中变得越来越重要。目前,市场上只有几种工具可供检索。每种语言都有其独特的发音方式和语言结构。阿拉伯语有复杂的词法。这使得这一领域的发展变得困难。在匹配过程中,需要一个典型的IR模型来理解相似的单词。本文对阿拉伯语信息检索的最新方法进行了比较研究。我们实现并比较了所有现有的阿拉伯语红外方法与阿拉伯语数据集。信息检索使用阿拉伯语数据集。我们还引进了一本阿拉伯语的词典。它包含了从几个阿拉伯书籍和网站收集的阿拉伯语单词。我们比较了不同词形化技术的性能。然后,我们进行了一系列的实验,以比较不同的方法阿拉伯IR。此外,阿拉伯语BERT用现有方法的性能检验了优越的性能。实验结果表明,BM25和多语言BERT在任务中排名最高。大型阿拉伯语数据集在信息检索方面的准确率达到89%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信