Empirical development of an exponential probabilistic model for text retrieval: using textual analysis to build a better model

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval Pub Date : 2003-07-28 DOI:10.1145/860435.860441

J. Teevan, David R Karger

引用次数: 27

Abstract

Much work in information retrieval focuses on using a model of documents and queries to derive retrieval algorithms. Model based development is a useful alternative to heuristic development because in a model the assumptions are explicit and can be examined and refined independent of the particular retrieval algorithm. We explore the explicit assumptions underlying the naïve framework by performing computational analysis of actual corpora and queries to devise a generative document model that closely matches text. Our thesis is that a model so developed will be more accurate than existing models, and thus more useful in retrieval, as well as other applications. We test this by learning from a corpus the best document model. We find the learned model better predicts the existence of text data and has improved performance on certain IR tasks.

查看原文本刊更多论文

文本检索的指数概率模型的实证发展:利用文本分析建立一个更好的模型

信息检索中的许多工作都集中在使用文档和查询模型来派生检索算法。基于模型的开发是启发式开发的一种有用的替代方案，因为在模型中，假设是明确的，可以独立于特定的检索算法进行检查和改进。我们通过对实际语料库和查询进行计算分析来探索naïve框架的明确假设，以设计一个与文本密切匹配的生成文档模型。我们的论点是，这样开发的模型将比现有的模型更准确，因此在检索和其他应用中更有用。我们通过从语料库中学习最佳文档模型来测试这一点。我们发现学习的模型可以更好地预测文本数据的存在，并且在某些IR任务上提高了性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval

自引率

0.00%

发文量