{"title":"Exploiting Stopping Time to Evaluate Accumulated Relevance","authors":"M. Ferrante, N. Ferro","doi":"10.1145/3409256.3409832","DOIUrl":null,"url":null,"abstract":"Evaluation measures are more or less explicitly based on user models which abstract how users interact with a ranked result list and how they accumulate utility from it. However, traditional measures typically come with a hard-coded user model which can be, at best, parametrized. Moreover, they take a deterministic approach which leads to assign a precise score to a system run. In this paper, we take a different angle and, by relying on Markov chains and random walks, we propose a new family of evaluation measures which are able to accommodate for different and flexible user models, allow for simulating the interaction of different users, and turn the score into a random variable which more richly describes the performance of a system. We also show how the proposed framework allows for instantiating and better explaining some state-of-the-art measures, like AP, RBP, DCG, and ERR.","PeriodicalId":430907,"journal":{"name":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","volume":"55 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 ACM SIGIR on International Conference on Theory of Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3409256.3409832","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3
Abstract
Evaluation measures are more or less explicitly based on user models which abstract how users interact with a ranked result list and how they accumulate utility from it. However, traditional measures typically come with a hard-coded user model which can be, at best, parametrized. Moreover, they take a deterministic approach which leads to assign a precise score to a system run. In this paper, we take a different angle and, by relying on Markov chains and random walks, we propose a new family of evaluation measures which are able to accommodate for different and flexible user models, allow for simulating the interaction of different users, and turn the score into a random variable which more richly describes the performance of a system. We also show how the proposed framework allows for instantiating and better explaining some state-of-the-art measures, like AP, RBP, DCG, and ERR.