Summaries, ranked retrieval and sessions: a unified framework for information access evaluation

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval Pub Date : 2013-07-28 DOI:10.1145/2484028.2484031

T. Sakai, Zhicheng Dou

{"title":"Summaries, ranked retrieval and sessions: a unified framework for information access evaluation","authors":"T. Sakai, Zhicheng Dou","doi":"10.1145/2484028.2484031","DOIUrl":null,"url":null,"abstract":"We introduce a general information access evaluation framework that can potentially handle summaries, ranked document lists and even multi query sessions seamlessly. Our framework first builds a trailtext which represents a concatenation of all the texts read by the user during a search session, and then computes an evaluation metric called U-measure over the trailtext. Instead of discounting the value of a retrieved piece of information based on ranks, U-measure discounts it based on its position within the trailtext. U-measure takes the document length into account just like Time-Biased Gain (TBG), and has the diminishing return property. It is therefore more realistic than rank-based metrics. Furthermore, it is arguably more flexible than TBG, as it is free from the linear traversal assumption (i.e., that the user scans the ranked list from top to bottom), and can handle information access tasks other than ad hoc retrieval. This paper demonstrates the validity and versatility of the U-measure framework. Our main conclusions are: (a) For ad hoc retrieval, U-measure is at least as reliable as TBG in terms of rank correlations with traditional metrics and discriminative power; (b) For diversified search, our diversity versions of U-measure are highly correlated with state-of-the-art diversity metrics; (c) For multi-query sessions, U-measure is highly correlated with Session nDCG; and (d) Unlike rank-based metrics such as DCG, U-measure can quantify the differences between linear and nonlinear traversals in sessions. We argue that our new framework is useful for understanding the user's search behaviour and for comparison across different information access styles (e.g. examining a direct answer vs. examining a ranked list of web pages).","PeriodicalId":178818,"journal":{"name":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","volume":"193 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"90","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484028.2484031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 90

Abstract

We introduce a general information access evaluation framework that can potentially handle summaries, ranked document lists and even multi query sessions seamlessly. Our framework first builds a trailtext which represents a concatenation of all the texts read by the user during a search session, and then computes an evaluation metric called U-measure over the trailtext. Instead of discounting the value of a retrieved piece of information based on ranks, U-measure discounts it based on its position within the trailtext. U-measure takes the document length into account just like Time-Biased Gain (TBG), and has the diminishing return property. It is therefore more realistic than rank-based metrics. Furthermore, it is arguably more flexible than TBG, as it is free from the linear traversal assumption (i.e., that the user scans the ranked list from top to bottom), and can handle information access tasks other than ad hoc retrieval. This paper demonstrates the validity and versatility of the U-measure framework. Our main conclusions are: (a) For ad hoc retrieval, U-measure is at least as reliable as TBG in terms of rank correlations with traditional metrics and discriminative power; (b) For diversified search, our diversity versions of U-measure are highly correlated with state-of-the-art diversity metrics; (c) For multi-query sessions, U-measure is highly correlated with Session nDCG; and (d) Unlike rank-based metrics such as DCG, U-measure can quantify the differences between linear and nonlinear traversals in sessions. We argue that our new framework is useful for understanding the user's search behaviour and for comparison across different information access styles (e.g. examining a direct answer vs. examining a ranked list of web pages).

查看原文本刊更多论文

摘要、排序检索和会话:信息访问评估的统一框架

我们引入了一个通用的信息访问评估框架，它可以无缝地处理摘要、排序文档列表甚至多查询会话。我们的框架首先构建一个trailtext，它表示用户在搜索会话期间读取的所有文本的连接，然后计算一个称为U-measure的评价指标。U-measure不是根据排名对检索到的信息进行折扣，而是根据其在trailtext中的位置对其进行折扣。U-measure像时间偏置增益(TBG)一样考虑了文档长度，并且具有收益递减的特性。因此，它比基于排名的指标更现实。此外，可以说它比TBG更灵活，因为它不需要线性遍历假设(即，用户从上到下扫描排名列表)，并且可以处理信息访问任务，而不是临时检索。本文论证了u -测度框架的有效性和通用性。我们的主要结论是:(a)对于临时检索，就与传统指标的等级相关性和判别能力而言，U-measure至少与TBG一样可靠;(b)对于多样化搜索，我们的U-measure的多样性版本与最先进的多样性指标高度相关;(c)对于多查询会话，U-measure与会话nDCG高度相关;(d)与基于等级的指标(如DCG)不同，U-measure可以量化会话中线性和非线性遍历之间的差异。我们认为，我们的新框架对于理解用户的搜索行为和跨不同信息访问风格的比较(例如，检查直接答案与检查网页排名列表)是有用的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval

自引率

0.00%

发文量