{"title":"阿拉伯语信息检索的BERT表示:比较研究","authors":"Moulay Abdellah, Kassimi, Abdessalam Essayad","doi":"10.46253/j.mr.v6i3.a1","DOIUrl":null,"url":null,"abstract":": Information is rapidly growing in online documents and social media in all languages. Retrieval of information from a language is a high-level task. However, Information Retrieval has become more important in research and commercial development. Presently only a few tools were available in the market for retrieval. Each language has its unique way of pronunciation and language structure. Arabic has a complex morphology. This made it difficult in the advancement of this field. A typical IR model is required to understand similar words in the matching process. In this paper, we presented a comparative study on recent approaches in Arabic Information Retrieval. We implemented and compared all existing approaches for Arabic IR with Arabic datasets. The information retrieval used an Arabic dataset. We also introduced a dictionary, an Arabic Lemmatizer.It contains Arabic words collected from several Arabic books and websites. We compare the performance of different lemmatization techniques. Then we conduct a series of experiments to compare different approaches to Arabic IR. Furthermore, Arabic BERT examined the superior performance with the existing approach's performance. The experimental result showed BM25 and multilingual BERT ranked most for tasks. The Large Arabic Dataset scored an accuracy of 89% in information retrieval.","PeriodicalId":167187,"journal":{"name":"Multimedia Research","volume":"57 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"BERT Representation for Arabic Information Retrieval: A Comparative Study\",\"authors\":\"Moulay Abdellah, Kassimi, Abdessalam Essayad\",\"doi\":\"10.46253/j.mr.v6i3.a1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\": Information is rapidly growing in online documents and social media in all languages. Retrieval of information from a language is a high-level task. However, Information Retrieval has become more important in research and commercial development. Presently only a few tools were available in the market for retrieval. Each language has its unique way of pronunciation and language structure. Arabic has a complex morphology. This made it difficult in the advancement of this field. A typical IR model is required to understand similar words in the matching process. In this paper, we presented a comparative study on recent approaches in Arabic Information Retrieval. We implemented and compared all existing approaches for Arabic IR with Arabic datasets. The information retrieval used an Arabic dataset. We also introduced a dictionary, an Arabic Lemmatizer.It contains Arabic words collected from several Arabic books and websites. We compare the performance of different lemmatization techniques. Then we conduct a series of experiments to compare different approaches to Arabic IR. Furthermore, Arabic BERT examined the superior performance with the existing approach's performance. The experimental result showed BM25 and multilingual BERT ranked most for tasks. The Large Arabic Dataset scored an accuracy of 89% in information retrieval.\",\"PeriodicalId\":167187,\"journal\":{\"name\":\"Multimedia Research\",\"volume\":\"57 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1900-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Multimedia Research\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.46253/j.mr.v6i3.a1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Multimedia Research","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46253/j.mr.v6i3.a1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
BERT Representation for Arabic Information Retrieval: A Comparative Study
: Information is rapidly growing in online documents and social media in all languages. Retrieval of information from a language is a high-level task. However, Information Retrieval has become more important in research and commercial development. Presently only a few tools were available in the market for retrieval. Each language has its unique way of pronunciation and language structure. Arabic has a complex morphology. This made it difficult in the advancement of this field. A typical IR model is required to understand similar words in the matching process. In this paper, we presented a comparative study on recent approaches in Arabic Information Retrieval. We implemented and compared all existing approaches for Arabic IR with Arabic datasets. The information retrieval used an Arabic dataset. We also introduced a dictionary, an Arabic Lemmatizer.It contains Arabic words collected from several Arabic books and websites. We compare the performance of different lemmatization techniques. Then we conduct a series of experiments to compare different approaches to Arabic IR. Furthermore, Arabic BERT examined the superior performance with the existing approach's performance. The experimental result showed BM25 and multilingual BERT ranked most for tasks. The Large Arabic Dataset scored an accuracy of 89% in information retrieval.