On Efficient Processing of Subspace Skyline Queries on High Dimensional Data

19th International Conference on Scientific and Statistical Database Management (SSDBM 2007) Pub Date : 2007-07-09 DOI:10.1109/SSDBM.2007.20

Wen Jin, A. Tung, M. Ester, Jiawei Han

{"title":"On Efficient Processing of Subspace Skyline Queries on High Dimensional Data","authors":"Wen Jin, A. Tung, M. Ester, Jiawei Han","doi":"10.1109/SSDBM.2007.20","DOIUrl":null,"url":null,"abstract":"Recent studies on efficiently answering subspace skyline queries can be separated into two approaches. The first focused on pre-materializing a set of skylines points in various subspaces while the second focus on dynamically answering the queries by using a set of anchors to prune off skyline points through spatial reasoning. Despite effort to compress the pre-materialized subspace skylines through removal of redundancy, the storage space for the first approach remain exponential in the number of dimensions. The query time for the second approach on the other hand also grow substantially for data with higher dimensionality where the pruning power of anchors become much weaker. In this paper, we propose methods for answering subspace skyline query on high dimensional data such that both prematerialization storage and query time can be moderated. We propose novel notions of maximal partial-dominating space, maximal partial-dominated space and the maximal equality space between pairs of skyline objects in the full space and use these concepts as the foundation for answering subspace skyline queries for high dimensional data. Query processing involves mostly simple pruning operations while skyline computation is done only on a small subset of candidate skyline points in the subspace. We also develop a random sampling method to compute the subspace skyline in an on-line fashion. Extensive experiments have been conducted and demonstrated the efficiency and effectiveness of our methods.","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"49 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSDBM.2007.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 34

Abstract

Recent studies on efficiently answering subspace skyline queries can be separated into two approaches. The first focused on pre-materializing a set of skylines points in various subspaces while the second focus on dynamically answering the queries by using a set of anchors to prune off skyline points through spatial reasoning. Despite effort to compress the pre-materialized subspace skylines through removal of redundancy, the storage space for the first approach remain exponential in the number of dimensions. The query time for the second approach on the other hand also grow substantially for data with higher dimensionality where the pruning power of anchors become much weaker. In this paper, we propose methods for answering subspace skyline query on high dimensional data such that both prematerialization storage and query time can be moderated. We propose novel notions of maximal partial-dominating space, maximal partial-dominated space and the maximal equality space between pairs of skyline objects in the full space and use these concepts as the foundation for answering subspace skyline queries for high dimensional data. Query processing involves mostly simple pruning operations while skyline computation is done only on a small subset of candidate skyline points in the subspace. We also develop a random sampling method to compute the subspace skyline in an on-line fashion. Extensive experiments have been conducted and demonstrated the efficiency and effectiveness of our methods.

查看原文本刊更多论文

高维数据上子空间Skyline查询的高效处理

最近对子空间天际线查询的有效应答研究可以分为两种方法。第一个侧重于在各个子空间中预实现一组天际线点，而第二个侧重于通过空间推理使用一组锚点来修剪天际线点，从而动态回答查询。尽管努力通过去除冗余来压缩预物化的子空间天际线，但第一种方法的存储空间在维数上仍然是指数级的。另一方面，对于具有更高维度的数据，第二种方法的查询时间也会大大增加，其中锚的修剪能力变得更弱。在本文中，我们提出了在高维数据上回答子空间天际线查询的方法，使得预物化存储和查询时间都可以被调节。我们提出了全空间中天际线对象对之间的最大部分支配空间、最大部分支配空间和最大相等空间的新概念，并将这些概念作为回答高维数据的子空间天际线查询的基础。查询处理主要涉及简单的剪枝操作，而天际线计算只在子空间中候选天际线点的一小部分上进行。我们还开发了一种随机抽样方法，以在线方式计算子空间天际线。大量的实验已经进行，并证明了我们的方法的效率和有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)

自引率

0.00%

发文量