Exploiting Stopping Time to Evaluate Accumulated Relevance

Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval Pub Date : 2020-09-14 DOI:10.1145/3409256.3409832

M. Ferrante, N. Ferro

引用次数: 3

Abstract

Evaluation measures are more or less explicitly based on user models which abstract how users interact with a ranked result list and how they accumulate utility from it. However, traditional measures typically come with a hard-coded user model which can be, at best, parametrized. Moreover, they take a deterministic approach which leads to assign a precise score to a system run. In this paper, we take a different angle and, by relying on Markov chains and random walks, we propose a new family of evaluation measures which are able to accommodate for different and flexible user models, allow for simulating the interaction of different users, and turn the score into a random variable which more richly describes the performance of a system. We also show how the proposed framework allows for instantiating and better explaining some state-of-the-art measures, like AP, RBP, DCG, and ERR.

查看原文本刊更多论文

利用停止时间评估累积相关性

评估措施或多或少是明确地基于用户模型的，该模型抽象了用户如何与排名结果列表交互以及他们如何从中积累效用。然而，传统的度量标准通常带有硬编码的用户模型，最多只能参数化。此外，它们采用确定性方法，从而为系统运行分配精确的分数。在本文中，我们采取了不同的角度，通过依靠马尔可夫链和随机游走，我们提出了一系列新的评估措施，这些措施能够适应不同和灵活的用户模型，允许模拟不同用户的交互，并将分数转化为更丰富地描述系统性能的随机变量。我们还展示了建议的框架如何允许实例化和更好地解释一些最先进的度量，如AP、RBP、DCG和ERR。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval

自引率

0.00%

发文量