用于检索评估的动态测试集合

Ben Carterette, Ashraf Bah Rabiou, M. Zengin
{"title":"用于检索评估的动态测试集合","authors":"Ben Carterette, Ashraf Bah Rabiou, M. Zengin","doi":"10.1145/2808194.2809470","DOIUrl":null,"url":null,"abstract":"Batch evaluation with test collections of documents, search topics, and relevance judgments has been the bedrock of IR evaluation since its adoption by Salton for his experiments on vector space systems. Such test collections have limitations: they contain no user interaction data; there is typically only one query per topic; they have limited size due to the cost of constructing them. In the last 15-20 years, it has become evident that having a log of user interactions and a large space of queries is invaluable for building effective retrieval systems, but such data is generally only available to search engine companies. Thus there is a gap between what academics can study using static test collections and what industrial researchers can study using dynamic user data. In this work we propose dynamic test collections to help bridge this gap. Like traditional test collections, a dynamic test collection consists of a set of topics and relevance judgments. But instead of static one-time queries, dynamic test collections generate queries in response to the system. They can generate other actions such as clicks and time spent reading documents. Like static test collections, there is no human in the loop, but since the queries are dynamic they can generate much more data for evaluation than static test collections can. And since they can simulate user interactions across a session, they can be used for evaluating retrieval systems that make use of session history or other user information to try to improve results.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"33","resultStr":"{\"title\":\"Dynamic Test Collections for Retrieval Evaluation\",\"authors\":\"Ben Carterette, Ashraf Bah Rabiou, M. Zengin\",\"doi\":\"10.1145/2808194.2809470\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Batch evaluation with test collections of documents, search topics, and relevance judgments has been the bedrock of IR evaluation since its adoption by Salton for his experiments on vector space systems. Such test collections have limitations: they contain no user interaction data; there is typically only one query per topic; they have limited size due to the cost of constructing them. In the last 15-20 years, it has become evident that having a log of user interactions and a large space of queries is invaluable for building effective retrieval systems, but such data is generally only available to search engine companies. Thus there is a gap between what academics can study using static test collections and what industrial researchers can study using dynamic user data. In this work we propose dynamic test collections to help bridge this gap. Like traditional test collections, a dynamic test collection consists of a set of topics and relevance judgments. But instead of static one-time queries, dynamic test collections generate queries in response to the system. They can generate other actions such as clicks and time spent reading documents. Like static test collections, there is no human in the loop, but since the queries are dynamic they can generate much more data for evaluation than static test collections can. And since they can simulate user interactions across a session, they can be used for evaluating retrieval systems that make use of session history or other user information to try to improve results.\",\"PeriodicalId\":440325,\"journal\":{\"name\":\"Proceedings of the 2015 International Conference on The Theory of Information Retrieval\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"33\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2015 International Conference on The Theory of Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2808194.2809470\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2808194.2809470","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 33

摘要

使用文档、搜索主题和相关性判断的测试集合进行批量评估,自从Salton在他的向量空间系统实验中采用它以来,一直是IR评估的基础。这样的测试集合有局限性:它们不包含用户交互数据;每个主题通常只有一个查询;由于建造成本的原因,它们的尺寸有限。在过去的15-20年里,很明显,拥有用户交互日志和大量查询空间对于构建有效的检索系统是无价的,但这些数据通常只有搜索引擎公司才能获得。因此,学术界可以使用静态测试集进行研究,而工业研究人员可以使用动态用户数据进行研究,这两者之间存在差距。在这项工作中,我们提出动态测试集合来帮助弥合这一差距。与传统的测试集合一样,动态测试集合由一组主题和相关判断组成。但是与静态的一次性查询不同,动态测试集合生成查询以响应系统。它们可以生成其他操作,比如点击和阅读文档所花费的时间。与静态测试集合一样,在循环中没有人,但是由于查询是动态的,因此它们可以生成比静态测试集合多得多的用于评估的数据。由于它们可以模拟跨会话的用户交互,因此它们可以用于评估利用会话历史记录或其他用户信息来尝试改进结果的检索系统。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Dynamic Test Collections for Retrieval Evaluation
Batch evaluation with test collections of documents, search topics, and relevance judgments has been the bedrock of IR evaluation since its adoption by Salton for his experiments on vector space systems. Such test collections have limitations: they contain no user interaction data; there is typically only one query per topic; they have limited size due to the cost of constructing them. In the last 15-20 years, it has become evident that having a log of user interactions and a large space of queries is invaluable for building effective retrieval systems, but such data is generally only available to search engine companies. Thus there is a gap between what academics can study using static test collections and what industrial researchers can study using dynamic user data. In this work we propose dynamic test collections to help bridge this gap. Like traditional test collections, a dynamic test collection consists of a set of topics and relevance judgments. But instead of static one-time queries, dynamic test collections generate queries in response to the system. They can generate other actions such as clicks and time spent reading documents. Like static test collections, there is no human in the loop, but since the queries are dynamic they can generate much more data for evaluation than static test collections can. And since they can simulate user interactions across a session, they can be used for evaluating retrieval systems that make use of session history or other user information to try to improve results.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信