Online Evaluation for Information Retrieval

IF 12.9 2区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Foundations and Trends in Information Retrieval Pub Date : 2016-06-07 DOI:10.1561/1500000051

Katja Hofmann, Lihong Li, Filip Radlinski

{"title":"Online Evaluation for Information Retrieval","authors":"Katja Hofmann, Lihong Li, Filip Radlinski","doi":"10.1561/1500000051","DOIUrl":null,"url":null,"abstract":"Online evaluation is one of the most common approaches to measure the effectiveness of an information retrieval system. It involves fielding the information retrieval system to real users, and observing these users' interactions in-situ while they engage with the system. This allows actual users with real world information needs to play an important part in assessing retrieval quality. As such, online evaluation complements the common alternative offline evaluation approaches which may provide more easily interpretable outcomes, yet are often less realistic when measuring of quality and actual user experience.In this survey, we provide an overview of online evaluation techniques for information retrieval. We show how online evaluation is used for controlled experiments, segmenting them into experiment designs that allow absolute or relative quality assessments. Our presentation of different metrics further partitions online evaluation based on different sized experimental units commonly of interest: documents, lists and sessions. Additionally, we include an extensive discussion of recent work on data re-use, and experiment estimation based on historical data.A substantial part of this work focuses on practical issues: How to run evaluations in practice, how to select experimental parameters, how to take into account ethical considerations inherent in online evaluations, and limitations that experimenters should be aware of. While most published work on online experimentation today is at large scale in systems with millions of users, we also emphasize that the same techniques can be applied at small scale. To this end, we emphasize recent work that makes it easier to use at smaller scales and encourage studying real-world information seeking in a wide range of scenarios. Finally, we present a summary of the most recent work in the area, and describe open problems, as well as postulating future directions.","PeriodicalId":48829,"journal":{"name":"Foundations and Trends in Information Retrieval","volume":"58 1","pages":"1-117"},"PeriodicalIF":12.9000,"publicationDate":"2016-06-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"97","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Foundations and Trends in Information Retrieval","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1561/1500000051","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 97

Abstract

Online evaluation is one of the most common approaches to measure the effectiveness of an information retrieval system. It involves fielding the information retrieval system to real users, and observing these users' interactions in-situ while they engage with the system. This allows actual users with real world information needs to play an important part in assessing retrieval quality. As such, online evaluation complements the common alternative offline evaluation approaches which may provide more easily interpretable outcomes, yet are often less realistic when measuring of quality and actual user experience.In this survey, we provide an overview of online evaluation techniques for information retrieval. We show how online evaluation is used for controlled experiments, segmenting them into experiment designs that allow absolute or relative quality assessments. Our presentation of different metrics further partitions online evaluation based on different sized experimental units commonly of interest: documents, lists and sessions. Additionally, we include an extensive discussion of recent work on data re-use, and experiment estimation based on historical data.A substantial part of this work focuses on practical issues: How to run evaluations in practice, how to select experimental parameters, how to take into account ethical considerations inherent in online evaluations, and limitations that experimenters should be aware of. While most published work on online experimentation today is at large scale in systems with millions of users, we also emphasize that the same techniques can be applied at small scale. To this end, we emphasize recent work that makes it easier to use at smaller scales and encourage studying real-world information seeking in a wide range of scenarios. Finally, we present a summary of the most recent work in the area, and describe open problems, as well as postulating future directions.

查看原文本刊更多论文

信息检索在线评价

在线评估是衡量信息检索系统有效性的最常用方法之一。它涉及到将信息检索系统部署到真实的用户，并在这些用户与系统交互时现场观察他们的交互。这允许具有真实世界信息需求的实际用户在评估检索质量方面发挥重要作用。因此，在线评估补充了常见的替代离线评估方法，后者可能提供更容易解释的结果，但在衡量质量和实际用户体验时往往不太现实。在这项调查中，我们提供了一个概述在线评估技术的信息检索。我们展示了在线评估如何用于控制实验，将它们划分为实验设计，允许绝对或相对质量评估。我们对不同指标的介绍进一步划分了基于不同大小的实验单元(通常是文档、列表和会话)的在线评估。此外，我们还包括对数据重用的最新工作的广泛讨论，以及基于历史数据的实验估计。这项工作的很大一部分集中在实际问题上:如何在实践中进行评估，如何选择实验参数，如何考虑在线评估中固有的伦理考虑，以及实验者应该意识到的局限性。虽然今天发表的大多数关于在线实验的工作都是在拥有数百万用户的系统中大规模进行的，但我们也强调同样的技术可以在小规模中应用。为此，我们强调最近的工作，使其更容易在更小的范围内使用，并鼓励在广泛的场景中研究现实世界的信息搜索。最后，我们对该领域的最新工作进行了总结，并描述了尚未解决的问题，以及对未来方向的假设。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Foundations and Trends in Information Retrieval COMPUTER SCIENCE, INFORMATION SYSTEMS-

CiteScore

39.10

自引率

0.00%

发文量

期刊介绍： The surge in research across all domains in the past decade has resulted in a plethora of new publications, causing an exponential growth in published research. Navigating through this extensive literature and staying current has become a time-consuming challenge. While electronic publishing provides instant access to more articles than ever, discerning the essential ones for a comprehensive understanding of any topic remains an issue. To tackle this, Foundations and Trends® in Information Retrieval - FnTIR - addresses the problem by publishing high-quality survey and tutorial monographs in the field. Each issue of Foundations and Trends® in Information Retrieval - FnT IR features a 50-100 page monograph authored by research leaders, covering tutorial subjects, research retrospectives, and survey papers that provide state-of-the-art reviews within the scope of the journal.