散度测度与静态指数修剪

Ruey-Cheng Chen, Chia-Jung Lee, W. Bruce Croft
{"title":"散度测度与静态指数修剪","authors":"Ruey-Cheng Chen, Chia-Jung Lee, W. Bruce Croft","doi":"10.1145/2808194.2809472","DOIUrl":null,"url":null,"abstract":"We study the problem of static index pruning in a renowned divergence minimization framework, using a range of divergence measures such as f-divergence and Rényi divergence as the objective. We show that many well-known divergence measures are convex in pruning decisions, and therefore can be exactly minimized using an efficient algorithm. Our approach allows postings be prioritized according to the amount of information they contribute to the index, and through specifying a different divergence measure the contribution is modeled on a different returns curve. In our experiment on GOV2 data, Rényi divergence of order infinity appears the most effective. This divergence measure significantly outperforms many standard methods and achieves identical retrieval effectiveness as full data using only 50% of the postings. When top-k precision is of the only concern, 10% of the data is sufficient to achieve the accuracy that one would usually expect from a full index.","PeriodicalId":440325,"journal":{"name":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-09-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":"{\"title\":\"On Divergence Measures and Static Index Pruning\",\"authors\":\"Ruey-Cheng Chen, Chia-Jung Lee, W. Bruce Croft\",\"doi\":\"10.1145/2808194.2809472\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the problem of static index pruning in a renowned divergence minimization framework, using a range of divergence measures such as f-divergence and Rényi divergence as the objective. We show that many well-known divergence measures are convex in pruning decisions, and therefore can be exactly minimized using an efficient algorithm. Our approach allows postings be prioritized according to the amount of information they contribute to the index, and through specifying a different divergence measure the contribution is modeled on a different returns curve. In our experiment on GOV2 data, Rényi divergence of order infinity appears the most effective. This divergence measure significantly outperforms many standard methods and achieves identical retrieval effectiveness as full data using only 50% of the postings. When top-k precision is of the only concern, 10% of the data is sufficient to achieve the accuracy that one would usually expect from a full index.\",\"PeriodicalId\":440325,\"journal\":{\"name\":\"Proceedings of the 2015 International Conference on The Theory of Information Retrieval\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2015-09-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"6\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 2015 International Conference on The Theory of Information Retrieval\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2808194.2809472\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2015 International Conference on The Theory of Information Retrieval","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2808194.2809472","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

摘要

我们在一个著名的散度最小化框架中研究了静态指数修剪问题,使用一系列的散度度量,如f散度和rsamunyi散度作为目标。我们证明了许多众所周知的散度度量在修剪决策中是凸的,因此可以使用有效的算法精确地最小化。我们的方法允许根据对指数贡献的信息量对帖子进行优先排序,并通过指定不同的发散度量,在不同的回报曲线上对贡献进行建模。在我们对GOV2数据的实验中,无限阶的rsamnyi散度是最有效的。这种差异度量明显优于许多标准方法,并且仅使用50%的帖子就实现了与完整数据相同的检索效率。当只考虑top-k精度时,10%的数据足以达到通常期望从完整索引获得的精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
On Divergence Measures and Static Index Pruning
We study the problem of static index pruning in a renowned divergence minimization framework, using a range of divergence measures such as f-divergence and Rényi divergence as the objective. We show that many well-known divergence measures are convex in pruning decisions, and therefore can be exactly minimized using an efficient algorithm. Our approach allows postings be prioritized according to the amount of information they contribute to the index, and through specifying a different divergence measure the contribution is modeled on a different returns curve. In our experiment on GOV2 data, Rényi divergence of order infinity appears the most effective. This divergence measure significantly outperforms many standard methods and achieves identical retrieval effectiveness as full data using only 50% of the postings. When top-k precision is of the only concern, 10% of the data is sufficient to achieve the accuracy that one would usually expect from a full index.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信