基于语义按意义索引的信息检索新方法

Proceedings of the 16th International Conference on Applied Computing 2019 Pub Date : 2019-11-07 DOI:10.33965/ac2019_201912l019

Ala Eddine Kharrat, L. Hlaoua

{"title":"基于语义按意义索引的信息检索新方法","authors":"Ala Eddine Kharrat, L. Hlaoua","doi":"10.33965/ac2019_201912l019","DOIUrl":null,"url":null,"abstract":"An Information Retrieval System (IRS) offers a number of tools and techniques, which enable to locate and visualize the relevant information needed. This information, is expressed by the user in the form of a query natural language. However, the representation of documents and the query in a traditional IRS lead to a lexical-centered relevance estimation which is, in fact, less efficient than a semantic-focused estimation. As a consequence, the documents that are actually relevant are not being recovered if they do not share words with the query, while the documents non relevant, which are words in common with the query, are recovered even though at times they do not have the meaning intended. This paper tackles this problem while suggesting a solution in the level of indexation of an IRS allowing it to improve its performance. To be more precise, we suggest a new approach of semantic indexation allowing to lead to the exact meaning of each term in a document or query undergoing a contextual analysis at the sentence level. In fact, if the system is able to comprehend the need of the user, then consequently it is perfectly capable to respond to it. Add to that, we suggest a simple method allowing to apply any model of IR on our new index table without changing its original bases making it faster. In order to validate this proposed approach, this new created system is evaluated base on numerous collections naming “TIME” , “BBC” , “The Guardian” and “BigThink” . The results based on the experiments indicate the efficacy of our hypothesis compared to traditional IR approaches.","PeriodicalId":432605,"journal":{"name":"Proceedings of the 16th International Conference on Applied Computing 2019","volume":"153 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"NEW INFORMATION RETRIEVAL APPROACH BASED ON SEMANTIC INDEXING BY MEANING\",\"authors\":\"Ala Eddine Kharrat, L. Hlaoua\",\"doi\":\"10.33965/ac2019_201912l019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"An Information Retrieval System (IRS) offers a number of tools and techniques, which enable to locate and visualize the relevant information needed. This information, is expressed by the user in the form of a query natural language. However, the representation of documents and the query in a traditional IRS lead to a lexical-centered relevance estimation which is, in fact, less efficient than a semantic-focused estimation. As a consequence, the documents that are actually relevant are not being recovered if they do not share words with the query, while the documents non relevant, which are words in common with the query, are recovered even though at times they do not have the meaning intended. This paper tackles this problem while suggesting a solution in the level of indexation of an IRS allowing it to improve its performance. To be more precise, we suggest a new approach of semantic indexation allowing to lead to the exact meaning of each term in a document or query undergoing a contextual analysis at the sentence level. In fact, if the system is able to comprehend the need of the user, then consequently it is perfectly capable to respond to it. Add to that, we suggest a simple method allowing to apply any model of IR on our new index table without changing its original bases making it faster. In order to validate this proposed approach, this new created system is evaluated base on numerous collections naming “TIME” , “BBC” , “The Guardian” and “BigThink” . The results based on the experiments indicate the efficacy of our hypothesis compared to traditional IR approaches.\",\"PeriodicalId\":432605,\"journal\":{\"name\":\"Proceedings of the 16th International Conference on Applied Computing 2019\",\"volume\":\"153 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 16th International Conference on Applied Computing 2019\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.33965/ac2019_201912l019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th International Conference on Applied Computing 2019","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33965/ac2019_201912l019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

信息检索系统(IRS)提供了许多工具和技术，可以定位和可视化所需的相关信息。该信息由用户以查询自然语言的形式表示。然而，传统IRS中的文档表示和查询导致以词汇为中心的相关性估计，实际上，这种估计比以语义为中心的估计效率低。因此，如果实际相关的文档不与查询共享单词，则不会被恢复，而不相关的文档(即与查询共享的单词)则会被恢复，即使有时它们没有预期的含义。本文解决了这个问题，同时提出了一个解决方案，在一个IRS的指数化水平，使其能够提高其性能。更准确地说，我们提出了一种新的语义索引方法，允许在句子级别进行上下文分析，从而得出文档或查询中每个术语的确切含义。事实上，如果系统能够理解用户的需求，那么它就完全有能力对用户的需求做出反应。除此之外，我们还建议一种简单的方法，允许在不改变其原始基的情况下将任何IR模型应用于我们的新索引表，从而使其更快。为了验证这个提议的方法，这个新创建的系统基于命名为“TIME”，“BBC”，“The Guardian”和“BigThink”的众多集合进行评估。实验结果表明，与传统的红外方法相比，我们的假设是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

NEW INFORMATION RETRIEVAL APPROACH BASED ON SEMANTIC INDEXING BY MEANING

An Information Retrieval System (IRS) offers a number of tools and techniques, which enable to locate and visualize the relevant information needed. This information, is expressed by the user in the form of a query natural language. However, the representation of documents and the query in a traditional IRS lead to a lexical-centered relevance estimation which is, in fact, less efficient than a semantic-focused estimation. As a consequence, the documents that are actually relevant are not being recovered if they do not share words with the query, while the documents non relevant, which are words in common with the query, are recovered even though at times they do not have the meaning intended. This paper tackles this problem while suggesting a solution in the level of indexation of an IRS allowing it to improve its performance. To be more precise, we suggest a new approach of semantic indexation allowing to lead to the exact meaning of each term in a document or query undergoing a contextual analysis at the sentence level. In fact, if the system is able to comprehend the need of the user, then consequently it is perfectly capable to respond to it. Add to that, we suggest a simple method allowing to apply any model of IR on our new index table without changing its original bases making it faster. In order to validate this proposed approach, this new created system is evaluated base on numerous collections naming “TIME” , “BBC” , “The Guardian” and “BigThink” . The results based on the experiments indicate the efficacy of our hypothesis compared to traditional IR approaches.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 16th International Conference on Applied Computing 2019

自引率

0.00%

发文量