多级检索系统的动态截止预测

Proceedings of the 21st Australasian Document Computing Symposium Pub Date : 2016-12-05 DOI:10.1145/3015022.3015026

J. Culpepper, C. Clarke, Jimmy J. Lin

{"title":"多级检索系统的动态截止预测","authors":"J. Culpepper, C. Clarke, Jimmy J. Lin","doi":"10.1145/3015022.3015026","DOIUrl":null,"url":null,"abstract":"Modern multi-stage retrieval systems are comprised of a candidate generation stage followed by one or more reranking stages. In such an architecture, the quality of the final ranked list may not be sensitive to the quality of the initial candidate pool, especially in terms of early precision. This provides several opportunities to increase retrieval efficiency without significantly sacrificing effectiveness. In this paper, we explore a new approach to dynamically predicting the size of an initial result set in the candidate generation stage, which can directly affect the overall efficiency and effectiveness of the entire system. Previous work exploring this tradeoff has focused on global parameter settings that apply to all queries, even though optimal settings vary across queries. In contrast, we propose a technique that makes a parameter prediction to maximize efficiency within an effectiveness envelope on a per query basis, using only static pre-retrieval features. Experimental results show that substantial efficiency gains are achievable. In addition, our framework provides a versatile tool that can be used to estimate the effectiveness-efficiency tradeoffs that are possible before selecting and tuning algorithms to make machine-learned predictions.","PeriodicalId":334601,"journal":{"name":"Proceedings of the 21st Australasian Document Computing Symposium","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-12-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"46","resultStr":"{\"title\":\"Dynamic Cutoff Prediction in Multi-Stage Retrieval Systems\",\"authors\":\"J. Culpepper, C. Clarke, Jimmy J. Lin\",\"doi\":\"10.1145/3015022.3015026\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Modern multi-stage retrieval systems are comprised of a candidate generation stage followed by one or more reranking stages. In such an architecture, the quality of the final ranked list may not be sensitive to the quality of the initial candidate pool, especially in terms of early precision. This provides several opportunities to increase retrieval efficiency without significantly sacrificing effectiveness. In this paper, we explore a new approach to dynamically predicting the size of an initial result set in the candidate generation stage, which can directly affect the overall efficiency and effectiveness of the entire system. Previous work exploring this tradeoff has focused on global parameter settings that apply to all queries, even though optimal settings vary across queries. In contrast, we propose a technique that makes a parameter prediction to maximize efficiency within an effectiveness envelope on a per query basis, using only static pre-retrieval features. Experimental results show that substantial efficiency gains are achievable. In addition, our framework provides a versatile tool that can be used to estimate the effectiveness-efficiency tradeoffs that are possible before selecting and tuning algorithms to make machine-learned predictions.\",\"PeriodicalId\":334601,\"journal\":{\"name\":\"Proceedings of the 21st Australasian Document Computing Symposium\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-12-05\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"46\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 21st Australasian Document Computing Symposium\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3015022.3015026\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st Australasian Document Computing Symposium","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3015022.3015026","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 46

摘要

现代多阶段检索系统由候选生成阶段和一个或多个重新排序阶段组成。在这样的体系结构中，最终排名列表的质量可能对初始候选池的质量不敏感，特别是在早期精度方面。这为在不显著牺牲有效性的情况下提高检索效率提供了几个机会。在本文中，我们探索了一种在候选生成阶段动态预测初始结果集大小的新方法，它直接影响整个系统的整体效率和有效性。以前探索这种权衡的工作主要集中在适用于所有查询的全局参数设置上，尽管最佳设置因查询而异。相比之下，我们提出了一种技术，该技术仅使用静态预检索特征，在每个查询的有效性范围内进行参数预测以最大化效率。实验结果表明，该方法可以实现较大的效率提高。此外，我们的框架提供了一个多功能工具，可用于在选择和调优算法以进行机器学习预测之前估计可能的有效性和效率权衡。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Dynamic Cutoff Prediction in Multi-Stage Retrieval Systems

Modern multi-stage retrieval systems are comprised of a candidate generation stage followed by one or more reranking stages. In such an architecture, the quality of the final ranked list may not be sensitive to the quality of the initial candidate pool, especially in terms of early precision. This provides several opportunities to increase retrieval efficiency without significantly sacrificing effectiveness. In this paper, we explore a new approach to dynamically predicting the size of an initial result set in the candidate generation stage, which can directly affect the overall efficiency and effectiveness of the entire system. Previous work exploring this tradeoff has focused on global parameter settings that apply to all queries, even though optimal settings vary across queries. In contrast, we propose a technique that makes a parameter prediction to maximize efficiency within an effectiveness envelope on a per query basis, using only static pre-retrieval features. Experimental results show that substantial efficiency gains are achievable. In addition, our framework provides a versatile tool that can be used to estimate the effectiveness-efficiency tradeoffs that are possible before selecting and tuning algorithms to make machine-learned predictions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 21st Australasian Document Computing Symposium

自引率

0.00%

发文量