An Efficient Processing of k-Dominant Skyline Query in MapReduce

Data4U '14 Pub Date : 2014-09-01 DOI:10.1145/2658840.2658846

Hao Tian, M. A. Siddique, Y. Morimoto

引用次数: 9

Abstract

Filtering uninteresting data is important to utilize "big data". Skyline query is one of popular techniques to filter uninteresting data, in which it selects a set of points that are not dominated by another from a given large database. However, a skyline query often retrieves too many points to analyze intensively especially for high-dimensional dataset. In order to solve the problem, k-dominant skyline queries have been introduced, which can control the number of retrieved points. However, conventional algorithms for computing k-dominant skyline queries are not well suited for parallel and distributed environments, such as the MapReduce framework. In this paper we considered an efficient parallel algorithm to process k-dominant skyline query in the MapReduce framework. Extensive experiments are conducted to evaluate the algorithm under different settings of data distribution, dimensionality, and cardinality.

查看原文本刊更多论文

MapReduce中k-Dominant Skyline查询的高效处理

过滤不感兴趣的数据对于利用“大数据”很重要。Skyline查询是一种流行的过滤无趣数据的技术，它从给定的大型数据库中选择一组不受其他点支配的点。然而，对于高维数据集来说，skyline查询通常会检索到太多的点而无法进行深入分析。为了解决这个问题，引入了k主导的天际线查询，它可以控制检索点的数量。然而，用于计算k-dominant skyline查询的传统算法并不适合并行和分布式环境，例如MapReduce框架。在本文中，我们考虑了一种在MapReduce框架中处理k-dominant skyline查询的高效并行算法。在不同的数据分布、维数和基数设置下，进行了大量的实验来评估该算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Data4U '14

自引率

0.00%

发文量