鲁棒伪相关反馈混合模型的正则化估计

Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval Pub Date : 2006-08-06 DOI:10.1145/1148170.1148201

Tao Tao, ChengXiang Zhai

{"title":"鲁棒伪相关反馈混合模型的正则化估计","authors":"Tao Tao, ChengXiang Zhai","doi":"10.1145/1148170.1148201","DOIUrl":null,"url":null,"abstract":"Pseudo-relevance feedback has proven to be an effective strategy for improving retrieval accuracy in all retrieval models. However the performance of existing pseudo feedback methods is often affected significantly by some parameters, such as the number of feedback documents to use and the relative weight of original query terms; these parameters generally have to be set by trial-and-error without any guidance. In this paper, we present a more robust method for pseudo feedback based on statistical language models. Our main idea is to integrate the original query with feedback documents in a single probabilistic mixture model and regularize the estimation of the language model parameters in the model so that the information in the feedback documents can be gradually added to the original query. Unlike most existing feedback methods, our new method has no parameter to tune. Experiment results on two representative data sets show that the new method is significantly more robust than a state-of-the-art baseline language modeling approach for feedback with comparable or better retrieval accuracy.","PeriodicalId":433366,"journal":{"name":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","volume":"42 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2006-08-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"230","resultStr":"{\"title\":\"Regularized estimation of mixture models for robust pseudo-relevance feedback\",\"authors\":\"Tao Tao, ChengXiang Zhai\",\"doi\":\"10.1145/1148170.1148201\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Pseudo-relevance feedback has proven to be an effective strategy for improving retrieval accuracy in all retrieval models. However the performance of existing pseudo feedback methods is often affected significantly by some parameters, such as the number of feedback documents to use and the relative weight of original query terms; these parameters generally have to be set by trial-and-error without any guidance. In this paper, we present a more robust method for pseudo feedback based on statistical language models. Our main idea is to integrate the original query with feedback documents in a single probabilistic mixture model and regularize the estimation of the language model parameters in the model so that the information in the feedback documents can be gradually added to the original query. Unlike most existing feedback methods, our new method has no parameter to tune. Experiment results on two representative data sets show that the new method is significantly more robust than a state-of-the-art baseline language modeling approach for feedback with comparable or better retrieval accuracy.\",\"PeriodicalId\":433366,\"journal\":{\"name\":\"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval\",\"volume\":\"42 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2006-08-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"230\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1148170.1148201\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1148170.1148201","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 230

摘要

在各种检索模型中，伪相关反馈已被证明是提高检索精度的有效策略。然而，现有的伪反馈方法的性能往往受到一些参数的显著影响，如要使用的反馈文档的数量和原始查询词的相对权重;这些参数通常必须在没有任何指导的情况下通过反复试验来设置。本文提出了一种基于统计语言模型的伪反馈鲁棒性更好的方法。我们的主要思想是将原始查询和反馈文档集成在一个单一的概率混合模型中，并对模型中语言模型参数的估计进行正则化，从而将反馈文档中的信息逐渐添加到原始查询中。与大多数现有的反馈方法不同，我们的新方法没有参数可调。在两个代表性数据集上的实验结果表明，新方法比最先进的基线语言建模方法具有更强的鲁棒性，并且具有相当或更好的检索精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Regularized estimation of mixture models for robust pseudo-relevance feedback

Pseudo-relevance feedback has proven to be an effective strategy for improving retrieval accuracy in all retrieval models. However the performance of existing pseudo feedback methods is often affected significantly by some parameters, such as the number of feedback documents to use and the relative weight of original query terms; these parameters generally have to be set by trial-and-error without any guidance. In this paper, we present a more robust method for pseudo feedback based on statistical language models. Our main idea is to integrate the original query with feedback documents in a single probabilistic mixture model and regularize the estimation of the language model parameters in the model so that the information in the feedback documents can be gradually added to the original query. Unlike most existing feedback methods, our new method has no parameter to tune. Experiment results on two representative data sets show that the new method is significantly more robust than a state-of-the-art baseline language modeling approach for feedback with comparable or better retrieval accuracy.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval

自引率

0.00%

发文量