{"title":"Distributed top-k query processing on multi-dimensional data with keywords","authors":"Daichi Amagata, T. Hara, S. Nishio","doi":"10.1145/2791347.2791355","DOIUrl":null,"url":null,"abstract":"As we are in the big data era, techniques for retrieving only user-desirable data objects from massive and diverse datasets is being required. Ranking queries, e.g., top-k queries, which rank data objects based on a user-specified scoring function, enable to find such interesting data for users, and have received significant attention due to its wide range of applications. While many techniques for both centralized and distributed top-k query processing have been developed, they do not consider query keywords, i.e., simply retrieving k data with the best score. Utilizing keywords, on the other hand, is a common approach in data (and information) retrieval. Despite of this fact, there is no study on retrieving top-k data containing all query keywords. We define, in this paper, a new query which enriches the conventional top-k queries, and propose some algorithms to solve the novel problem of how to efficiently retrieve k data objects with the best score and all query from distributed databases. Extensive experiments on both real and synthetic data have demonstrated the efficiency and scalability of our algorithms in terms of communication cost and running time.","PeriodicalId":225179,"journal":{"name":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","volume":"20 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-06-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 27th International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2791347.2791355","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11
Abstract
As we are in the big data era, techniques for retrieving only user-desirable data objects from massive and diverse datasets is being required. Ranking queries, e.g., top-k queries, which rank data objects based on a user-specified scoring function, enable to find such interesting data for users, and have received significant attention due to its wide range of applications. While many techniques for both centralized and distributed top-k query processing have been developed, they do not consider query keywords, i.e., simply retrieving k data with the best score. Utilizing keywords, on the other hand, is a common approach in data (and information) retrieval. Despite of this fact, there is no study on retrieving top-k data containing all query keywords. We define, in this paper, a new query which enriches the conventional top-k queries, and propose some algorithms to solve the novel problem of how to efficiently retrieve k data objects with the best score and all query from distributed databases. Extensive experiments on both real and synthetic data have demonstrated the efficiency and scalability of our algorithms in terms of communication cost and running time.