Xinsheng Li, Si Li, Weiran Xu, Guang Chen, Jun Guo
{"title":"Weakly supervised relevance feedback based on an improved language model","authors":"Xinsheng Li, Si Li, Weiran Xu, Guang Chen, Jun Guo","doi":"10.1109/NLPKE.2010.5587859","DOIUrl":null,"url":null,"abstract":"Relevance feedback, which traditionally uses the terms in the relevant documents to enrich the user's initial query, is an effective method for improving retrieval performance. This approach has another problem is that Relevance feedback assumes that most frequent terms in the feedback documents are useful for the retrieval. In fact, the reports of some experiments show that it does not hold in reality many expansion terms identified in traditional approaches are indeed unrelated to the query and harmful to the retrieval. In this paper, we propose to select better and more relevant documents with a clustering algorithm. And then we present an improved Language Model to help us identify the good terms from those relevant documents. Ours experiments on the 2008 TREC collection show that retrieval effectiveness can be much improved when the improved Language Model is used.","PeriodicalId":259975,"journal":{"name":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","volume":"73 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 6th International Conference on Natural Language Processing and Knowledge Engineering(NLPKE-2010)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NLPKE.2010.5587859","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Relevance feedback, which traditionally uses the terms in the relevant documents to enrich the user's initial query, is an effective method for improving retrieval performance. This approach has another problem is that Relevance feedback assumes that most frequent terms in the feedback documents are useful for the retrieval. In fact, the reports of some experiments show that it does not hold in reality many expansion terms identified in traditional approaches are indeed unrelated to the query and harmful to the retrieval. In this paper, we propose to select better and more relevant documents with a clustering algorithm. And then we present an improved Language Model to help us identify the good terms from those relevant documents. Ours experiments on the 2008 TREC collection show that retrieval effectiveness can be much improved when the improved Language Model is used.