Robust query rewriting using anchor data

Proceedings of the sixth ACM international conference on Web search and data mining Pub Date : 2013-02-04 DOI:10.1145/2433396.2433440

Nick Craswell, B. Billerbeck, Dennis Fetterly, Marc Najork

引用次数: 12

Abstract

Query rewriting algorithms can be used as a form of query expansion, by combining the user's original query with automatically generated rewrites. Rewriting algorithms bring linguistic datasets to bear without the need for iterative relevance feedback, but most studies of rewriting have used proprietary datasets such as large-scale search logs. By contrast this paper uses readily available data, particularly ClueWeb09 link text with over 1.2 billion anchor phrases, to generate rewrites. To avoid overfitting, our initial analysis is performed using Million Query Track queries, leading us to identify three algorithms which perform well. We then test the algorithms on Web and newswire data. Results show good properties in terms of robustness and early precision.

查看原文本刊更多论文

使用锚数据进行健壮的查询重写

通过将用户的原始查询与自动生成的重写相结合，查询重写算法可以用作查询扩展的一种形式。重写算法在不需要迭代相关反馈的情况下带来语言数据集，但大多数重写研究都使用专有数据集，如大规模搜索日志。相比之下，本文使用现成的数据，特别是ClueWeb09链接文本超过12亿个锚短语，来生成重写。为了避免过拟合，我们的初始分析是使用百万查询跟踪查询执行的，这使我们确定了三种表现良好的算法。然后，我们在Web和新闻专线数据上测试算法。结果表明，该方法具有较好的鲁棒性和早期精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the sixth ACM international conference on Web search and data mining

自引率

0.00%

发文量