使用锚数据进行健壮的查询重写

Proceedings of the sixth ACM international conference on Web search and data mining Pub Date : 2013-02-04 DOI:10.1145/2433396.2433440

Nick Craswell, B. Billerbeck, Dennis Fetterly, Marc Najork

{"title":"使用锚数据进行健壮的查询重写","authors":"Nick Craswell, B. Billerbeck, Dennis Fetterly, Marc Najork","doi":"10.1145/2433396.2433440","DOIUrl":null,"url":null,"abstract":"Query rewriting algorithms can be used as a form of query expansion, by combining the user's original query with automatically generated rewrites. Rewriting algorithms bring linguistic datasets to bear without the need for iterative relevance feedback, but most studies of rewriting have used proprietary datasets such as large-scale search logs. By contrast this paper uses readily available data, particularly ClueWeb09 link text with over 1.2 billion anchor phrases, to generate rewrites. To avoid overfitting, our initial analysis is performed using Million Query Track queries, leading us to identify three algorithms which perform well. We then test the algorithms on Web and newswire data. Results show good properties in terms of robustness and early precision.","PeriodicalId":324799,"journal":{"name":"Proceedings of the sixth ACM international conference on Web search and data mining","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-02-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"12","resultStr":"{\"title\":\"Robust query rewriting using anchor data\",\"authors\":\"Nick Craswell, B. Billerbeck, Dennis Fetterly, Marc Najork\",\"doi\":\"10.1145/2433396.2433440\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Query rewriting algorithms can be used as a form of query expansion, by combining the user's original query with automatically generated rewrites. Rewriting algorithms bring linguistic datasets to bear without the need for iterative relevance feedback, but most studies of rewriting have used proprietary datasets such as large-scale search logs. By contrast this paper uses readily available data, particularly ClueWeb09 link text with over 1.2 billion anchor phrases, to generate rewrites. To avoid overfitting, our initial analysis is performed using Million Query Track queries, leading us to identify three algorithms which perform well. We then test the algorithms on Web and newswire data. Results show good properties in terms of robustness and early precision.\",\"PeriodicalId\":324799,\"journal\":{\"name\":\"Proceedings of the sixth ACM international conference on Web search and data mining\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2013-02-04\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"12\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the sixth ACM international conference on Web search and data mining\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2433396.2433440\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the sixth ACM international conference on Web search and data mining","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2433396.2433440","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 12

摘要

通过将用户的原始查询与自动生成的重写相结合，查询重写算法可以用作查询扩展的一种形式。重写算法在不需要迭代相关反馈的情况下带来语言数据集，但大多数重写研究都使用专有数据集，如大规模搜索日志。相比之下，本文使用现成的数据，特别是ClueWeb09链接文本超过12亿个锚短语，来生成重写。为了避免过拟合，我们的初始分析是使用百万查询跟踪查询执行的，这使我们确定了三种表现良好的算法。然后，我们在Web和新闻专线数据上测试算法。结果表明，该方法具有较好的鲁棒性和早期精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Robust query rewriting using anchor data

Query rewriting algorithms can be used as a form of query expansion, by combining the user's original query with automatically generated rewrites. Rewriting algorithms bring linguistic datasets to bear without the need for iterative relevance feedback, but most studies of rewriting have used proprietary datasets such as large-scale search logs. By contrast this paper uses readily available data, particularly ClueWeb09 link text with over 1.2 billion anchor phrases, to generate rewrites. To avoid overfitting, our initial analysis is performed using Million Query Track queries, leading us to identify three algorithms which perform well. We then test the algorithms on Web and newswire data. Results show good properties in terms of robustness and early precision.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the sixth ACM international conference on Web search and data mining

自引率

0.00%

发文量