Distributed top-k query processing on multi-dimensional data with keywords

Proceedings of the 27th International Conference on Scientific and Statistical Database Management Pub Date : 2015-06-29 DOI:10.1145/2791347.2791355

Daichi Amagata, T. Hara, S. Nishio

{"title":"Distributed top-k query processing on multi-dimensional data with keywords","authors":"Daichi Amagata, T. Hara, S. Nishio","doi":"10.1145/2791347.2791355","DOIUrl":null,"url":null,"abstract":"As we are in the big data era, techniques for retrieving only user-desirable data objects from massive and diverse datasets is being required. Ranking queries, e.g., top-k queries, which rank data objects based on a user-specified scoring function, enable to find such interesting data for users, and have received significant attention due to its wide range of applications. While many techniques for both centralized and distributed top-k query processing have been developed, they do not consider query keywords, i.e., simply retrieving k data with the best score. Utilizing keywords, on the other hand, is a common approach in data (and information) retrieval. Despite of this fact, there is no study on retrieving top-k data containing all query keywords. We define, in this paper, a new query which enriches the conventional top-k queries, and propose some algorithms to solve the novel problem of how to efficiently retrieve k data objects with the best score and all query from distributed databases. Extensive experiments on both real and synthetic data have demonstrated the efficiency and scalability of our algorithms in terms of communication cost and running time.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2791347.2791355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 11

Abstract

As we are in the big data era, techniques for retrieving only user-desirable data objects from massive and diverse datasets is being required. Ranking queries, e.g., top-k queries, which rank data objects based on a user-specified scoring function, enable to find such interesting data for users, and have received significant attention due to its wide range of applications. While many techniques for both centralized and distributed top-k query processing have been developed, they do not consider query keywords, i.e., simply retrieving k data with the best score. Utilizing keywords, on the other hand, is a common approach in data (and information) retrieval. Despite of this fact, there is no study on retrieving top-k data containing all query keywords. We define, in this paper, a new query which enriches the conventional top-k queries, and propose some algorithms to solve the novel problem of how to efficiently retrieve k data objects with the best score and all query from distributed databases. Extensive experiments on both real and synthetic data have demonstrated the efficiency and scalability of our algorithms in terms of communication cost and running time.

查看原文本刊更多论文

基于关键字的多维数据分布式top-k查询处理

由于我们处于大数据时代，需要从大量和不同的数据集中只检索用户需要的数据对象的技术。排名查询，例如top-k查询，它根据用户指定的评分函数对数据对象进行排名，可以为用户找到这些有趣的数据，并且由于其广泛的应用而受到了极大的关注。虽然已经开发了许多用于集中式和分布式top-k查询处理的技术，但它们都没有考虑查询关键字，即简单地检索得分最高的k个数据。另一方面，利用关键字是数据(和信息)检索中的常用方法。尽管如此，目前还没有关于检索包含所有查询关键字的top-k数据的研究。本文定义了一种新的查询，丰富了传统的top-k查询，并提出了一些算法来解决如何从分布式数据库中以最优分数和所有查询高效检索k个数据对象的新问题。在真实数据和合成数据上的大量实验证明了我们的算法在通信成本和运行时间方面的效率和可扩展性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 27th International Conference on Scientific and Statistical Database Management

自引率

0.00%

发文量