挑剔:用集合定义的选择处理top-k查询

Proceedings of the 21st ACM international conference on Information and knowledge management Pub Date : 2012-10-29 DOI:10.1145/2396761.2396877

A. Stupar, S. Michel

{"title":"挑剔:用集合定义的选择处理top-k查询","authors":"A. Stupar, S. Michel","doi":"10.1145/2396761.2396877","DOIUrl":null,"url":null,"abstract":"Focusing on the top-K items according to a ranking criterion constitutes an important functionality in many different query answering scenarios. The idea is to read only the necessary information---mostly from secondary storage---with the ultimate goal to achieve low latency. In this work, we consider processing such top-K queries under the constraint that the result items are members of a specific set, which is provided at query time. We call this restriction a set-defined selection criterion. Set-defined selections drastically influence the pros and cons of an id-ordered index vs. a score-ordered index. We present a mathematical model that allows to decide at runtime which index to choose, leading to a combined index. To improve the latency around the break even point of the two indices, we show how to benefit from a partitioned score-ordered index and present an algorithm to create such partitions based on analyzing query logs. Further performance gains can be enjoyed using approximate top-K results, with tunable result quality. The presented approaches are evaluated using both real-world and synthetic data.","PeriodicalId":313414,"journal":{"name":"Proceedings of the 21st ACM international conference on Information and knowledge management","volume":"326 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-10-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Being picky: processing top-k queries with set-defined selections\",\"authors\":\"A. Stupar, S. Michel\",\"doi\":\"10.1145/2396761.2396877\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Focusing on the top-K items according to a ranking criterion constitutes an important functionality in many different query answering scenarios. The idea is to read only the necessary information---mostly from secondary storage---with the ultimate goal to achieve low latency. In this work, we consider processing such top-K queries under the constraint that the result items are members of a specific set, which is provided at query time. We call this restriction a set-defined selection criterion. Set-defined selections drastically influence the pros and cons of an id-ordered index vs. a score-ordered index. We present a mathematical model that allows to decide at runtime which index to choose, leading to a combined index. To improve the latency around the break even point of the two indices, we show how to benefit from a partitioned score-ordered index and present an algorithm to create such partitions based on analyzing query logs. Further performance gains can be enjoyed using approximate top-K results, with tunable result quality. The presented approaches are evaluated using both real-world and synthetic data.\",\"PeriodicalId\":313414,\"journal\":{\"name\":\"Proceedings of the 21st ACM international conference on Information and knowledge management\",\"volume\":\"326 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2012-10-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 21st ACM international conference on Information and knowledge management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2396761.2396877\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 21st ACM international conference on Information and knowledge management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2396761.2396877","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

在许多不同的查询回答场景中，根据排序标准关注前k项是一项重要的功能。其思想是只读取必要的信息——主要来自辅助存储——最终目标是实现低延迟。在这项工作中，我们考虑在结果项是查询时提供的特定集合的成员的约束下处理此类top-K查询。我们称这种限制为集合定义的选择标准。集合定义的选择极大地影响id排序索引与分数排序索引的优缺点。我们提出了一个数学模型，允许在运行时决定选择哪个索引，从而生成一个组合索引。为了改善两个索引的盈亏平衡点附近的延迟，我们展示了如何从分区分数排序索引中获益，并介绍了一种基于分析查询日志创建此类分区的算法。使用近似的top-K结果可以获得进一步的性能提升，并具有可调的结果质量。使用真实世界和合成数据对所提出的方法进行了评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Being picky: processing top-k queries with set-defined selections

Focusing on the top-K items according to a ranking criterion constitutes an important functionality in many different query answering scenarios. The idea is to read only the necessary information---mostly from secondary storage---with the ultimate goal to achieve low latency. In this work, we consider processing such top-K queries under the constraint that the result items are members of a specific set, which is provided at query time. We call this restriction a set-defined selection criterion. Set-defined selections drastically influence the pros and cons of an id-ordered index vs. a score-ordered index. We present a mathematical model that allows to decide at runtime which index to choose, leading to a combined index. To improve the latency around the break even point of the two indices, we show how to benefit from a partitioned score-ordered index and present an algorithm to create such partitions based on analyzing query logs. Further performance gains can be enjoyed using approximate top-K results, with tunable result quality. The presented approaches are evaluated using both real-world and synthetic data.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 21st ACM international conference on Information and knowledge management

自引率

0.00%

发文量