Investigating Retrieval Performance with Manually-Built Topic Models

RIAO Conference Pub Date : 2007-05-30 DOI:10.5555/1931390.1931423

Xing Wei, W. Bruce Croft

引用次数: 28

Abstract

Modeling text with topics is currently a popular research area in both Machine Learning and Information Retrieval (IR). Most of this research has focused on automatic methods though there are many hand-crafted topic resources available online. In this paper we investigate retrieval performance with topic models constructed manually based on a hand-crafted directory resource. The original query is smoothed on the manually selected topic model, which can also be viewed as an "ideal" user context model. Experiments with these topic models on the TREC retrieval tasks show that this type of topic model alone provides little benefit, and the overall performance is not as good as relevance modeling (which is an automatic query modification model). However, smoothing the query with topic models outperforms relevance models for a subset of the queries and automatic selection from these two models for particular queries gives better results overall than relevance models. We further demonstrate some improvements over relevance models with automatically built topic models based on the directory resource.

查看原文本刊更多论文

用人工构建的主题模型研究检索性能

带主题的文本建模是当前机器学习和信息检索(IR)领域的一个热门研究领域。尽管网上有许多手工制作的主题资源，但大多数研究都集中在自动方法上。在本文中，我们研究了基于手工制作的目录资源手工构建主题模型的检索性能。原始查询在手动选择的主题模型上进行平滑处理，该模型也可以视为“理想”用户上下文模型。这些主题模型在TREC检索任务上的实验表明，这种类型的主题模型单独提供的好处很少，总体性能不如相关性建模(这是一种自动查询修改模型)。但是，对于查询子集，使用主题模型平滑查询的性能优于相关模型，并且从这两个模型中自动选择特定查询的结果总体上优于相关模型。我们进一步演示了基于目录资源自动构建主题模型对相关模型的一些改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

RIAO Conference

自引率

0.00%

发文量