Optimal top-k generation of attribute combinations based on ranked lists

Jiaheng Lu, P. Senellart, Chunbin Lin, Xiaoyong Du, Shan Wang, Xinxing Chen
{"title":"Optimal top-k generation of attribute combinations based on ranked lists","authors":"Jiaheng Lu, P. Senellart, Chunbin Lin, Xiaoyong Du, Shan Wang, Xinxing Chen","doi":"10.1145/2213836.2213883","DOIUrl":null,"url":null,"abstract":"In this work, we study a novel query type, called top-k,m queries. Suppose we are given a set of groups and each group contains a set of attributes, each of which is associated with a ranked list of tuples, with ID and score. All lists are ranked in decreasing order of the scores of tuples. We are interested in finding the best combinations of attributes, each combination involving one attribute from each group. More specifically, we want the top-k combinations of attributes according to the corresponding top-m tuples with matching IDs. This problem has a wide range of applications from databases to search engines on traditional and non-traditional types of data (relational data, XML, text, etc.). We show that a straightforward extension of an optimal top-k algorithm, the Threshold Algorithm (TA), has shortcomings in solving the km problem, as it needs to compute a large number of intermediate results for each combination and reads moreinputs than needed. To overcome this weakness, we provide here, for the first time, a provably instance-optimal algorithm and further develop optimizations for efficient query evaluation to reduce computational and memory costs and the number of accesses. We demonstrate experimentally the scalability and efficiency of our algorithms over three real applications.","PeriodicalId":212616,"journal":{"name":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","volume":"135 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2012-05-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2213836.2213883","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

In this work, we study a novel query type, called top-k,m queries. Suppose we are given a set of groups and each group contains a set of attributes, each of which is associated with a ranked list of tuples, with ID and score. All lists are ranked in decreasing order of the scores of tuples. We are interested in finding the best combinations of attributes, each combination involving one attribute from each group. More specifically, we want the top-k combinations of attributes according to the corresponding top-m tuples with matching IDs. This problem has a wide range of applications from databases to search engines on traditional and non-traditional types of data (relational data, XML, text, etc.). We show that a straightforward extension of an optimal top-k algorithm, the Threshold Algorithm (TA), has shortcomings in solving the km problem, as it needs to compute a large number of intermediate results for each combination and reads moreinputs than needed. To overcome this weakness, we provide here, for the first time, a provably instance-optimal algorithm and further develop optimizations for efficient query evaluation to reduce computational and memory costs and the number of accesses. We demonstrate experimentally the scalability and efficiency of our algorithms over three real applications.
基于排名列表的最优top-k属性组合生成
在这项工作中,我们研究了一种新的查询类型,称为top-k,m查询。假设我们有一组组,每个组都包含一组属性,每个属性都与具有ID和分数的元组排序列表相关联。所有列表按照元组分数的递减顺序排列。我们感兴趣的是找到属性的最佳组合,每个组合涉及每个组中的一个属性。更具体地说,我们希望根据具有匹配id的对应的top-m元组获得top-k个属性组合。这个问题有着广泛的应用,从数据库到传统和非传统类型数据(关系数据、XML、文本等)的搜索引擎。我们证明了最优top-k算法的直接扩展,即阈值算法(TA),在解决km问题方面存在缺点,因为它需要为每个组合计算大量的中间结果,并且读取比需要的更多的输入。为了克服这个缺点,我们在这里首次提供了一个可证明的实例最优算法,并进一步开发了有效查询评估的优化,以减少计算和内存成本以及访问次数。我们在三个实际应用中通过实验证明了算法的可扩展性和效率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信