Efficient Processing of Skyline-Join Queries over Multiple Data Sources

IF 2.2 2区计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Database Systems Pub Date : 2015-06-30 DOI:10.1145/2699483

M. Nagendra, K. Candan

{"title":"Efficient Processing of Skyline-Join Queries over Multiple Data Sources","authors":"M. Nagendra, K. Candan","doi":"10.1145/2699483","DOIUrl":null,"url":null,"abstract":"Efficient processing of skyline queries has been an area of growing interest. Many of the earlier skyline techniques assumed that the skyline query is applied to a single data table. Naturally, these algorithms were not suitable for many applications in which the skyline query may involve attributes belonging to multiple data sources. In other words, if the data used in the skyline query are stored in multiple tables, then join operations would be required before the skyline can be searched. The task of computing skylines on multiple data sources has been coined as the skyline-join problem and various skyline-join algorithms have been proposed. However, the current proposals suffer several drawbacks: they often need to scan the input tables exhaustively in order to obtain the set of skyline-join results; moreover, the pruning techniques employed to eliminate the tuples are largely based on expensive pairwise tuple-to-tuple comparisons. In this article, we aim to address these shortcomings by proposing two novel skyline-join algorithms, namely skyline-sensitive join (S2J) and symmetric skyline-sensitive join (S3J), to process skyline queries over two data sources. Our approaches compute the results using a novel layer/region pruning technique (LR-pruning) that prunes the join space in blocks as opposed to individual data points, thereby avoiding excessive pairwise point-to-point dominance checks. Furthermore, the S3J algorithm utilizes an early stopping condition in order to successfully compute the skyline results by accessing only a subset of the input tables. In addition to S2J and S3J, we also propose the S2 J-M and S3 J-M algorithms. These algorithms extend S2J's and S3J's two-way skyline-join ability to efficiently process skyline-join queries over more than two data sources. S2 J-M and S3 J-M leverage the extended concept of LR-pruning, called M-way LR-pruning, to compute multi-way skyline-joins in which more than two data sources are integrated during skyline processing. We report extensive experimental results that confirm the advantages of the proposed algorithms over state-of-the-art skyline-join techniques.","PeriodicalId":50915,"journal":{"name":"ACM Transactions on Database Systems","volume":"42 1","pages":"10:1-10:46"},"PeriodicalIF":2.2000,"publicationDate":"2015-06-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"9","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Database Systems","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/2699483","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 9

Abstract

Efficient processing of skyline queries has been an area of growing interest. Many of the earlier skyline techniques assumed that the skyline query is applied to a single data table. Naturally, these algorithms were not suitable for many applications in which the skyline query may involve attributes belonging to multiple data sources. In other words, if the data used in the skyline query are stored in multiple tables, then join operations would be required before the skyline can be searched. The task of computing skylines on multiple data sources has been coined as the skyline-join problem and various skyline-join algorithms have been proposed. However, the current proposals suffer several drawbacks: they often need to scan the input tables exhaustively in order to obtain the set of skyline-join results; moreover, the pruning techniques employed to eliminate the tuples are largely based on expensive pairwise tuple-to-tuple comparisons. In this article, we aim to address these shortcomings by proposing two novel skyline-join algorithms, namely skyline-sensitive join (S2J) and symmetric skyline-sensitive join (S3J), to process skyline queries over two data sources. Our approaches compute the results using a novel layer/region pruning technique (LR-pruning) that prunes the join space in blocks as opposed to individual data points, thereby avoiding excessive pairwise point-to-point dominance checks. Furthermore, the S3J algorithm utilizes an early stopping condition in order to successfully compute the skyline results by accessing only a subset of the input tables. In addition to S2J and S3J, we also propose the S2 J-M and S3 J-M algorithms. These algorithms extend S2J's and S3J's two-way skyline-join ability to efficiently process skyline-join queries over more than two data sources. S2 J-M and S3 J-M leverage the extended concept of LR-pruning, called M-way LR-pruning, to compute multi-way skyline-joins in which more than two data sources are integrated during skyline processing. We report extensive experimental results that confirm the advantages of the proposed algorithms over state-of-the-art skyline-join techniques.

查看原文本刊更多论文

多数据源上Skyline-Join查询的高效处理

高效处理天际线查询已经成为人们越来越感兴趣的一个领域。许多早期的skyline技术都假定skyline查询应用于单个数据表。当然，这些算法不适合许多应用程序，其中skyline查询可能涉及属于多个数据源的属性。换句话说，如果skyline查询中使用的数据存储在多个表中，那么在搜索skyline之前将需要进行连接操作。在多个数据源上计算天际线的任务被称为天际线连接问题，并提出了各种天际线连接算法。然而，目前的建议有几个缺点:它们经常需要彻底扫描输入表以获得skyline-join结果集;此外，用于消除元组的修剪技术主要基于昂贵的成对元组到元组比较。在本文中，我们的目标是通过提出两种新的天际线连接算法来解决这些缺点，即天际线敏感连接(S2J)和对称天际线敏感连接(S3J)，以处理两个数据源上的天际线查询。我们的方法使用一种新的层/区域修剪技术(lr -剪枝)来计算结果，该技术修剪块中的连接空间，而不是单个数据点，从而避免了过多的成对点对点优势检查。此外，S3J算法利用早期停止条件，通过仅访问输入表的子集来成功计算天际线结果。除了S2J和S3J算法，我们还提出了S2 J-M和S3 J-M算法。这些算法扩展了S2J和S3J的双向天际线连接能力，以有效地处理两个以上数据源上的天际线连接查询。S2 J-M和S3 J-M利用扩展的lr -剪枝概念(称为M-way lr -剪枝)来计算在天际线处理过程中集成两个以上数据源的多路天际线连接。我们报告了大量的实验结果，证实了所提出的算法比最先进的天际线连接技术的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Database Systems 工程技术-计算机：软件工程

CiteScore

5.60

自引率

0.00%

发文量

审稿时长

>12 weeks

期刊介绍： Heavily used in both academic and corporate R&D settings, ACM Transactions on Database Systems (TODS) is a key publication for computer scientists working in data abstraction, data modeling, and designing data management systems. Topics include storage and retrieval, transaction management, distributed and federated databases, semantics of data, intelligent databases, and operations and algorithms relating to these areas. In this rapidly changing field, TODS provides insights into the thoughts of the best minds in database R&D.