阿拉伯语信息检索系统中上下文信息充实查询的方法

Souheyl Mallat, Houssem Abdellaoui, M. Maraoui, M. Zrigui
{"title":"阿拉伯语信息检索系统中上下文信息充实查询的方法","authors":"Souheyl Mallat, Houssem Abdellaoui, M. Maraoui, M. Zrigui","doi":"10.1109/ICTA.2015.7426926","DOIUrl":null,"url":null,"abstract":"In this paper, we propose a method is to improve the performance of information retrieval systems (IRS) by increasing the selectivity of relevant documents on the web. Indeed, a significant number of relevant documents on the web are not returned by an IRS (specifically a search engine), because of the richness of natural language Arabics. For this purpose the search engine does not reach high performance and does not meet the needs of users. To remedy this problem, we propose a method of enrichment of the query. This method relies on many steps. First, identification of significant terms (simple and composed) present in the query. Then, generation of a descriptive list and its assignment to each term that has been identified as significant in the query. A descriptive list is a set of linguistic knowledge of different types (morphological, syntactic and semantic). In this paper we are interested in the statistical treatment, based on the similarity method. This method exploits the weighting functions of Salton TF-IDF and TF-IEF on the list generated in the previous step. TF-IDF function identifies relevant documents, while the TF-IEF's role is to identify the relevant sentence. The terms of high weight (which are terms which may be correlated to the context of the response) are incorporated into the original query. The application of this method is based on a corpus of documents belonging to a closed domain.","PeriodicalId":375443,"journal":{"name":"2015 5th International Conference on Information & Communication Technology and Accessibility (ICTA)","volume":"25 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Method of enriching queries by contextual information to approve of information retrieval system in Arabic\",\"authors\":\"Souheyl Mallat, Houssem Abdellaoui, M. Maraoui, M. Zrigui\",\"doi\":\"10.1109/ICTA.2015.7426926\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we propose a method is to improve the performance of information retrieval systems (IRS) by increasing the selectivity of relevant documents on the web. Indeed, a significant number of relevant documents on the web are not returned by an IRS (specifically a search engine), because of the richness of natural language Arabics. For this purpose the search engine does not reach high performance and does not meet the needs of users. To remedy this problem, we propose a method of enrichment of the query. This method relies on many steps. First, identification of significant terms (simple and composed) present in the query. Then, generation of a descriptive list and its assignment to each term that has been identified as significant in the query. A descriptive list is a set of linguistic knowledge of different types (morphological, syntactic and semantic). In this paper we are interested in the statistical treatment, based on the similarity method. This method exploits the weighting functions of Salton TF-IDF and TF-IEF on the list generated in the previous step. TF-IDF function identifies relevant documents, while the TF-IEF's role is to identify the relevant sentence. The terms of high weight (which are terms which may be correlated to the context of the response) are incorporated into the original query. The application of this method is based on a corpus of documents belonging to a closed domain.\",\"PeriodicalId\":375443,\"journal\":{\"name\":\"2015 5th International Conference on Information & Communication Technology and Accessibility (ICTA)\",\"volume\":\"25 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2015 5th International Conference on Information & Communication Technology and Accessibility (ICTA)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTA.2015.7426926\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 5th International Conference on Information & Communication Technology and Accessibility (ICTA)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTA.2015.7426926","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

本文提出了一种通过增加网络上相关文献的选择性来提高信息检索系统(IRS)性能的方法。事实上,由于自然语言阿拉伯语的丰富性,网络上大量相关文档没有被IRS(特别是搜索引擎)返回。为此,搜索引擎没有达到高性能,不能满足用户的需求。为了解决这个问题,我们提出了一种充实查询的方法。这种方法需要很多步骤。首先,识别查询中存在的重要术语(简单和组合)。然后,生成一个描述性列表,并将其分配给查询中被标识为重要的每个术语。描述性列表是一组不同类型(形态、句法和语义)的语言知识。在本文中,我们感兴趣的是基于相似度方法的统计处理。该方法利用了上一步生成的列表上的Salton TF-IDF和TF-IEF的权重函数。TF-IDF的功能是识别相关的文档,而TF-IEF的作用是识别相关的句子。高权重的术语(即可能与响应上下文相关的术语)被合并到原始查询中。该方法的应用是基于一个属于封闭领域的文档语料库。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Method of enriching queries by contextual information to approve of information retrieval system in Arabic
In this paper, we propose a method is to improve the performance of information retrieval systems (IRS) by increasing the selectivity of relevant documents on the web. Indeed, a significant number of relevant documents on the web are not returned by an IRS (specifically a search engine), because of the richness of natural language Arabics. For this purpose the search engine does not reach high performance and does not meet the needs of users. To remedy this problem, we propose a method of enrichment of the query. This method relies on many steps. First, identification of significant terms (simple and composed) present in the query. Then, generation of a descriptive list and its assignment to each term that has been identified as significant in the query. A descriptive list is a set of linguistic knowledge of different types (morphological, syntactic and semantic). In this paper we are interested in the statistical treatment, based on the similarity method. This method exploits the weighting functions of Salton TF-IDF and TF-IEF on the list generated in the previous step. TF-IDF function identifies relevant documents, while the TF-IEF's role is to identify the relevant sentence. The terms of high weight (which are terms which may be correlated to the context of the response) are incorporated into the original query. The application of this method is based on a corpus of documents belonging to a closed domain.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信