Search Result Diversification with Guarantee of Topic Proportionality

Sheikh Muhammad Sarwar, Raghavendra Addanki, Ali Montazeralghaem, S. Pal, J. Allan
{"title":"Search Result Diversification with Guarantee of Topic Proportionality","authors":"Sheikh Muhammad Sarwar, Raghavendra Addanki, Ali Montazeralghaem, S. Pal, J. Allan","doi":"10.1145/3409256.3409839","DOIUrl":null,"url":null,"abstract":"Search result diversification based on topic proportionality considers a document as a bag of weighted topics and aims to reorder or down-sample a ranked list in a way that maintains topic proportionality. The goal is to show the topic distribution from an ambiguous query at all points in the revised list, hoping to satisfy all users in expectation. One effective approach, PM-2, greedily selects the best topic that maintains proportionality at each ranking position and then selects the document that best represents that topic. From a theoretical perspective, this approach does not provide any guarantee that topic proportionality holds in the small ranked list. Moreover, this approach does not take query-document relevance into account. We propose a Linear Programming (LP) formulation, LP-QL, that maintains topic proportionality and simultaneously maximizes relevance. We show that this approach satisfies topic proportionality constraints in expectation. Empirically, it achieves a 5.5% performance gain (significant) in terms of alpha-NDCG compared to PM-2 when we use LDA as the topic modelling approach. Furthermore, we propose LP-PM-2 that integrates the solution of LP-QL with PM-2. LP-PM-2 achieves 3.2% performance gain (significant) over PM-2 in terms of alpha-NDCG with term based topic modeling approach. All of our experiments are based on a popular web document collection, ClueWeb09 Category B, and the queries are taken from TREC Web Track's diversity task.","PeriodicalId":430907,"journal":{"name":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","volume":"142 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3409256.3409839","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Search result diversification based on topic proportionality considers a document as a bag of weighted topics and aims to reorder or down-sample a ranked list in a way that maintains topic proportionality. The goal is to show the topic distribution from an ambiguous query at all points in the revised list, hoping to satisfy all users in expectation. One effective approach, PM-2, greedily selects the best topic that maintains proportionality at each ranking position and then selects the document that best represents that topic. From a theoretical perspective, this approach does not provide any guarantee that topic proportionality holds in the small ranked list. Moreover, this approach does not take query-document relevance into account. We propose a Linear Programming (LP) formulation, LP-QL, that maintains topic proportionality and simultaneously maximizes relevance. We show that this approach satisfies topic proportionality constraints in expectation. Empirically, it achieves a 5.5% performance gain (significant) in terms of alpha-NDCG compared to PM-2 when we use LDA as the topic modelling approach. Furthermore, we propose LP-PM-2 that integrates the solution of LP-QL with PM-2. LP-PM-2 achieves 3.2% performance gain (significant) over PM-2 in terms of alpha-NDCG with term based topic modeling approach. All of our experiments are based on a popular web document collection, ClueWeb09 Category B, and the queries are taken from TREC Web Track's diversity task.
搜索结果多样化与主题相称性保证
基于主题比例性的搜索结果多样化将文档视为加权主题的集合,旨在以保持主题比例性的方式对排名列表进行重新排序或降采样。我们的目标是在修改后的列表中显示一个模糊查询在所有点上的主题分布,希望能满足所有用户的期望。PM-2是一种有效的方法,它贪婪地选择在每个排名位置上保持比例的最佳主题,然后选择最能代表该主题的文档。从理论角度来看,这种方法并不能保证在小排名列表中保持主题比例性。而且,这种方法不考虑查询文档的相关性。我们提出一个线性规划(LP)公式,LP- ql,保持主题比例性,同时最大化相关性。我们证明了这种方法在期望上满足主题比例性约束。根据经验,当我们使用LDA作为主题建模方法时,与PM-2相比,它在α - ndcg方面实现了5.5%的性能增益(显着)。在此基础上,我们提出了LP-QL与PM-2的集成解决方案LP-PM-2。就基于术语的主题建模方法的alpha-NDCG而言,LP-PM-2比PM-2实现了3.2%的性能增益(显著)。我们所有的实验都是基于一个流行的网络文档集合,ClueWeb09类别B,查询来自TREC web Track的多样性任务。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信