基于元搜索的阿拉伯语信息检索方法

Online Inf. Rev. Pub Date : 2022-02-25 DOI:10.1108/oir-11-2020-0515

Souheila Ben Guirat, Ibrahim Bounhas, Y. Slimani

{"title":"基于元搜索的阿拉伯语信息检索方法","authors":"Souheila Ben Guirat, Ibrahim Bounhas, Y. Slimani","doi":"10.1108/oir-11-2020-0515","DOIUrl":null,"url":null,"abstract":"PurposeThe semantic relations between Arabic word representations were recognized and widely studied in theoretical studies in linguistics many centuries ago. Nonetheless, most of the previous research in automatic information retrieval (IR) focused on stem or root-based indexing, while lemmas and patterns are under-exploited. However, the authors believe that each of the four morphological levels encapsulates a part of the meaning of words. That is, the purpose is to aggregate these levels using more sophisticated approaches to reach the optimal combination which enhances IR.Design/methodology/approachThe authors first compare the state-of-the art Arabic natural language processing (NLP) tools in IR. This allows to select the most accurate tool in each representation level i.e. developing four basic IR systems. Then, the authors compare two rank aggregation approaches which combine the results of these systems. The first approach is based on linear combination, while the second exploits classification-based meta-search.FindingsCombining different word representation levels, consistently and significantly enhances IR results. The proposed classification-based approach outperforms linear combination and all the basic systems.Research limitations/implicationsThe work stands by a standard experimental comparative study which assesses several NLP tools and combining approaches on different test collections and IR models. Thus, it may be helpful for future research works to choose the most suitable tools and develop more sophisticated methods for handling the complexity of Arabic language.Originality/valueThe originality of the idea is to consider that the richness of Arabic is an exploitable characteristic and no more a challenging limit. Thus, the authors combine 4 different morphological levels for the first time in Arabic IR. This approach widely overtook previous research results.Peer reviewThe peer review history for this article is available at: https://publons.com/publon/10.1108/OIR-11-2020-0515","PeriodicalId":143302,"journal":{"name":"Online Inf. Rev.","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-02-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Meta-search based approach for Arabic information retrieval\",\"authors\":\"Souheila Ben Guirat, Ibrahim Bounhas, Y. Slimani\",\"doi\":\"10.1108/oir-11-2020-0515\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"PurposeThe semantic relations between Arabic word representations were recognized and widely studied in theoretical studies in linguistics many centuries ago. Nonetheless, most of the previous research in automatic information retrieval (IR) focused on stem or root-based indexing, while lemmas and patterns are under-exploited. However, the authors believe that each of the four morphological levels encapsulates a part of the meaning of words. That is, the purpose is to aggregate these levels using more sophisticated approaches to reach the optimal combination which enhances IR.Design/methodology/approachThe authors first compare the state-of-the art Arabic natural language processing (NLP) tools in IR. This allows to select the most accurate tool in each representation level i.e. developing four basic IR systems. Then, the authors compare two rank aggregation approaches which combine the results of these systems. The first approach is based on linear combination, while the second exploits classification-based meta-search.FindingsCombining different word representation levels, consistently and significantly enhances IR results. The proposed classification-based approach outperforms linear combination and all the basic systems.Research limitations/implicationsThe work stands by a standard experimental comparative study which assesses several NLP tools and combining approaches on different test collections and IR models. Thus, it may be helpful for future research works to choose the most suitable tools and develop more sophisticated methods for handling the complexity of Arabic language.Originality/valueThe originality of the idea is to consider that the richness of Arabic is an exploitable characteristic and no more a challenging limit. Thus, the authors combine 4 different morphological levels for the first time in Arabic IR. This approach widely overtook previous research results.Peer reviewThe peer review history for this article is available at: https://publons.com/publon/10.1108/OIR-11-2020-0515\",\"PeriodicalId\":143302,\"journal\":{\"name\":\"Online Inf. Rev.\",\"volume\":\"76 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-02-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Online Inf. Rev.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1108/oir-11-2020-0515\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Online Inf. Rev.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1108/oir-11-2020-0515","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

目的阿拉伯语词语表征之间的语义关系在语言学理论研究中得到了广泛的认识和研究。然而，以往的自动信息检索研究大多集中在基于词干或词根的索引上，缺乏对引理和模式的充分利用。然而，作者认为，四个形态层次中的每一个都包含了单词意义的一部分。也就是说，目的是使用更复杂的方法来汇总这些水平，以达到提高IR的最佳组合。设计/方法/方法作者首先比较了IR中最先进的阿拉伯语自然语言处理(NLP)工具。这允许在每个表示级别中选择最准确的工具，即开发四个基本IR系统。然后，作者比较了两种结合这些系统结果的排名聚合方法。第一种方法基于线性组合，而第二种方法利用基于分类的元搜索。结合不同的单词表示水平，一致且显著地提高了IR结果。所提出的基于分类的方法优于线性组合和所有基本系统。研究局限/启示本研究基于一项标准的实验比较研究，该研究评估了几种NLP工具，并结合了不同测试集和IR模型的方法。因此，选择最合适的工具和开发更复杂的方法来处理阿拉伯语的复杂性可能有助于未来的研究工作。原创性/价值这个想法的原创性在于考虑到阿拉伯语的丰富性是一种可开发的特征，而不再是一个具有挑战性的限制。因此，作者首次在阿拉伯语IR中结合了4个不同的形态层次。这种方法广泛地取代了以前的研究结果。同行评议本文的同行评议历史可在:https://publons.com/publon/10.1108/OIR-11-2020-0515

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Meta-search based approach for Arabic information retrieval

PurposeThe semantic relations between Arabic word representations were recognized and widely studied in theoretical studies in linguistics many centuries ago. Nonetheless, most of the previous research in automatic information retrieval (IR) focused on stem or root-based indexing, while lemmas and patterns are under-exploited. However, the authors believe that each of the four morphological levels encapsulates a part of the meaning of words. That is, the purpose is to aggregate these levels using more sophisticated approaches to reach the optimal combination which enhances IR.Design/methodology/approachThe authors first compare the state-of-the art Arabic natural language processing (NLP) tools in IR. This allows to select the most accurate tool in each representation level i.e. developing four basic IR systems. Then, the authors compare two rank aggregation approaches which combine the results of these systems. The first approach is based on linear combination, while the second exploits classification-based meta-search.FindingsCombining different word representation levels, consistently and significantly enhances IR results. The proposed classification-based approach outperforms linear combination and all the basic systems.Research limitations/implicationsThe work stands by a standard experimental comparative study which assesses several NLP tools and combining approaches on different test collections and IR models. Thus, it may be helpful for future research works to choose the most suitable tools and develop more sophisticated methods for handling the complexity of Arabic language.Originality/valueThe originality of the idea is to consider that the richness of Arabic is an exploitable characteristic and no more a challenging limit. Thus, the authors combine 4 different morphological levels for the first time in Arabic IR. This approach widely overtook previous research results.Peer reviewThe peer review history for this article is available at: https://publons.com/publon/10.1108/OIR-11-2020-0515

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Online Inf. Rev.

自引率

0.00%

发文量