Evaluation of Pseudo-Relevance Feedback using Wikipedia

Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval Pub Date : 2019-06-28 DOI:10.1145/3342827.3342845

Murtadha Aljubran

引用次数: 0

Abstract

Users have specific information needs which are expressed in short queries to information retrieval systems. The queries are unstructured, and they tend to be short and ambiguous in most cases. Using the shallow language statistics including probabilistic or language models such as BM25 or Indri respectively can enhance the retrieval system metrics like Mean Average Precision (MAP). However, such methods depend on query terms and their presence in the retrieved document to define relevance. Query expansion is a technique that can be used to overcome this problem by expanding the query with terms from an initial top few relevant documents. The question that we try to answer is whether the quality of the corpus used for expansion produce a significant improvement MAP and precision at top 30 retrieved documents. We show that the quality and the selection criteria of expansion documents are important factors in query expansion performance.

查看原文本刊更多论文

使用维基百科评估伪相关反馈

用户有特定的信息需求，这些需求可以通过对信息检索系统的简短查询来表达。查询是非结构化的，在大多数情况下，它们往往很短而且含糊不清。使用浅层语言统计(包括概率或语言模型，如BM25或Indri)可以提高检索系统的指标，如平均精度(MAP)。但是，这些方法依赖于查询术语及其在检索文档中的存在来定义相关性。查询扩展是一种可以用来克服这个问题的技术，它通过使用最初的几个最重要的相关文档中的术语来扩展查询。我们试图回答的问题是，用于扩展的语料库的质量是否会显著提高前30个检索文档的MAP和精度。研究表明，扩展文档的质量和选择标准是影响查询扩展性能的重要因素。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval

自引率

0.00%

发文量