用于检索评估的动态测试集合

Proceedings of the 2015 International Conference on The Theory of Information Retrieval Pub Date : 2015-09-27 DOI:10.1145/2808194.2809470

Ben Carterette, Ashraf Bah Rabiou, M. Zengin

{"title":"用于检索评估的动态测试集合","authors":"Ben Carterette, Ashraf Bah Rabiou, M. Zengin","doi":"10.1145/2808194.2809470","DOIUrl":null,"url":null,"abstract":"Batch evaluation with test collections of documents, search topics, and relevance judgments has been the bedrock of IR evaluation since its adoption by Salton for his experiments on vector space systems. Such test collections have limitations: they contain no user interaction data; there is typically only one query per topic; they have limited size due to the cost of constructing them. In the last 15-20 years, it has become evident that having a log of user interactions and a large space of queries is invaluable for building effective retrieval systems, but such data is generally only available to search engine companies. Thus there is a gap between what academics can study using static test collections and what industrial researchers can study using dynamic user data. In this work we propose dynamic test collections to help bridge this gap. Like traditional test collections, a dynamic test collection consists of a set of topics and relevance judgments. But instead of static one-time queries, dynamic test collections generate queries in response to the system. They can generate other actions such as clicks and time spent reading documents. Like static test collections, there is no human in the loop, but since the queries are dynamic they can generate much more data for evaluation than static test collections can. And since they can simulate user interactions across a session, they can be used for evaluating retrieval systems that make use of session history or other user information to try to improve results.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":"{\"title\":\"Dynamic Test Collections for Retrieval Evaluation\",\"authors\":\"Ben Carterette, Ashraf Bah Rabiou, M. Zengin\",\"doi\":\"10.1145/2808194.2809470\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Batch evaluation with test collections of documents, search topics, and relevance judgments has been the bedrock of IR evaluation since its adoption by Salton for his experiments on vector space systems. Such test collections have limitations: they contain no user interaction data; there is typically only one query per topic; they have limited size due to the cost of constructing them. In the last 15-20 years, it has become evident that having a log of user interactions and a large space of queries is invaluable for building effective retrieval systems, but such data is generally only available to search engine companies. Thus there is a gap between what academics can study using static test collections and what industrial researchers can study using dynamic user data. In this work we propose dynamic test collections to help bridge this gap. Like traditional test collections, a dynamic test collection consists of a set of topics and relevance judgments. But instead of static one-time queries, dynamic test collections generate queries in response to the system. They can generate other actions such as clicks and time spent reading documents. Like static test collections, there is no human in the loop, but since the queries are dynamic they can generate much more data for evaluation than static test collections can. And since they can simulate user interactions across a session, they can be used for evaluating retrieval systems that make use of session history or other user information to try to improve results.\",\"PeriodicalId\":440325,\"journal\":{\"name\":\"Proceedings of the 2015 International Conference on The Theory of Information Retrieval\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"33\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2015 International Conference on The Theory of Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2808194.2809470\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2808194.2809470","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 33

摘要

使用文档、搜索主题和相关性判断的测试集合进行批量评估，自从Salton在他的向量空间系统实验中采用它以来，一直是IR评估的基础。这样的测试集合有局限性:它们不包含用户交互数据;每个主题通常只有一个查询;由于建造成本的原因，它们的尺寸有限。在过去的15-20年里，很明显，拥有用户交互日志和大量查询空间对于构建有效的检索系统是无价的，但这些数据通常只有搜索引擎公司才能获得。因此，学术界可以使用静态测试集进行研究，而工业研究人员可以使用动态用户数据进行研究，这两者之间存在差距。在这项工作中，我们提出动态测试集合来帮助弥合这一差距。与传统的测试集合一样，动态测试集合由一组主题和相关判断组成。但是与静态的一次性查询不同，动态测试集合生成查询以响应系统。它们可以生成其他操作，比如点击和阅读文档所花费的时间。与静态测试集合一样，在循环中没有人，但是由于查询是动态的，因此它们可以生成比静态测试集合多得多的用于评估的数据。由于它们可以模拟跨会话的用户交互，因此它们可以用于评估利用会话历史记录或其他用户信息来尝试改进结果的检索系统。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dynamic Test Collections for Retrieval Evaluation

Batch evaluation with test collections of documents, search topics, and relevance judgments has been the bedrock of IR evaluation since its adoption by Salton for his experiments on vector space systems. Such test collections have limitations: they contain no user interaction data; there is typically only one query per topic; they have limited size due to the cost of constructing them. In the last 15-20 years, it has become evident that having a log of user interactions and a large space of queries is invaluable for building effective retrieval systems, but such data is generally only available to search engine companies. Thus there is a gap between what academics can study using static test collections and what industrial researchers can study using dynamic user data. In this work we propose dynamic test collections to help bridge this gap. Like traditional test collections, a dynamic test collection consists of a set of topics and relevance judgments. But instead of static one-time queries, dynamic test collections generate queries in response to the system. They can generate other actions such as clicks and time spent reading documents. Like static test collections, there is no human in the loop, but since the queries are dynamic they can generate much more data for evaluation than static test collections can. And since they can simulate user interactions across a session, they can be used for evaluating retrieval systems that make use of session history or other user information to try to improve results.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 2015 International Conference on The Theory of Information Retrieval

自引率

0.00%

发文量