Learning to rank relevant and novel documents through user feedback

Proceedings of the 19th ACM international conference on Information and knowledge management Pub Date : 2010-10-26 DOI:10.1145/1871437.1871499

A. Lad, Yiming Yang

{"title":"Learning to rank relevant and novel documents through user feedback","authors":"A. Lad, Yiming Yang","doi":"10.1145/1871437.1871499","DOIUrl":null,"url":null,"abstract":"We consider the problem of learning to rank relevant and novel documents so as to directly maximize a performance metric called Expected Global Utility (EGU), which has several desirable properties: (i) It measures retrieval performance in terms of relevant as well as novel information, (ii) gives more importance to top ranks to reflect common browsing behavior of users, as opposed to existing objective functions based on set-coverage, (iii) accommodates different levels of tolerance towards redundancy, which is not taken into account by existing evaluation measures, and (iv) extends naturally to the evaluation of session-based retrieval comprising multiple ranked lists. Our ground truth is defined in terms of \"information nuggets\", which are obviously not known to the retrieval system when processing a new user query. Therefore, our approach uses observable query and document features (words and named entities) as surrogates for nuggets, whose weights are learned based on user feedback in an iterative search session. The ranked list is produced to maximize the weighted coverage of these surrogate nuggets. The optimization of such coverage-based metrics is known to be NP-hard. Therefore, we use a greedy algorithm and show that it guarantees good performance due to the submodularity of the objective function. Our experiments on Topic Detection and Tracking data show that the proposed approach represents an efficient and effective retrieval strategy for maximizing EGU, as compared to a purely-relevance based ranking approach that uses Indri, as well as a MMR-based approach for non-redundant ranking.","PeriodicalId":310611,"journal":{"name":"Proceedings of the 19th ACM international conference on Information and knowledge management","volume":"52 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-10-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 19th ACM international conference on Information and knowledge management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1871437.1871499","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

We consider the problem of learning to rank relevant and novel documents so as to directly maximize a performance metric called Expected Global Utility (EGU), which has several desirable properties: (i) It measures retrieval performance in terms of relevant as well as novel information, (ii) gives more importance to top ranks to reflect common browsing behavior of users, as opposed to existing objective functions based on set-coverage, (iii) accommodates different levels of tolerance towards redundancy, which is not taken into account by existing evaluation measures, and (iv) extends naturally to the evaluation of session-based retrieval comprising multiple ranked lists. Our ground truth is defined in terms of "information nuggets", which are obviously not known to the retrieval system when processing a new user query. Therefore, our approach uses observable query and document features (words and named entities) as surrogates for nuggets, whose weights are learned based on user feedback in an iterative search session. The ranked list is produced to maximize the weighted coverage of these surrogate nuggets. The optimization of such coverage-based metrics is known to be NP-hard. Therefore, we use a greedy algorithm and show that it guarantees good performance due to the submodularity of the objective function. Our experiments on Topic Detection and Tracking data show that the proposed approach represents an efficient and effective retrieval strategy for maximizing EGU, as compared to a purely-relevance based ranking approach that uses Indri, as well as a MMR-based approach for non-redundant ranking.

查看原文本刊更多论文

学习通过用户反馈对相关和新颖的文档进行排序

我们考虑学习对相关和新颖文档进行排序的问题，以便直接最大化称为预期全局效用(EGU)的性能指标，它具有几个理想的属性:(i)它根据相关信息和新信息来衡量检索性能，(ii)与现有的基于集合覆盖的目标函数相反，(ii)更重视最高排名以反映用户的共同浏览行为，(iii)容纳对冗余的不同程度的容忍，现有的评估措施没有考虑到这一点，(iv)自然地扩展到由多个排名列表组成的基于会话的检索的评估。我们的基本事实是根据“信息块”来定义的，在处理新的用户查询时，检索系统显然不知道这些信息块。因此，我们的方法使用可观察的查询和文档特征(单词和命名实体)作为掘金的替代品，其权重是根据迭代搜索会话中的用户反馈学习的。排名表的产生是为了最大限度地提高这些替代掘金的加权覆盖率。这种基于覆盖率的指标的优化是np困难的。因此，我们使用贪婪算法，并证明由于目标函数的子模块性，它保证了良好的性能。我们在主题检测和跟踪数据上的实验表明，与使用Indri的基于纯粹相关性的排名方法和基于mrr的非冗余排名方法相比，所提出的方法代表了最大化EGU的高效和有效的检索策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 19th ACM international conference on Information and knowledge management

自引率

0.00%

发文量