On Efficient Processing of Subspace Skyline Queries on High Dimensional Data

Wen Jin, A. Tung, M. Ester, Jiawei Han
{"title":"On Efficient Processing of Subspace Skyline Queries on High Dimensional Data","authors":"Wen Jin, A. Tung, M. Ester, Jiawei Han","doi":"10.1109/SSDBM.2007.20","DOIUrl":null,"url":null,"abstract":"Recent studies on efficiently answering subspace skyline queries can be separated into two approaches. The first focused on pre-materializing a set of skylines points in various subspaces while the second focus on dynamically answering the queries by using a set of anchors to prune off skyline points through spatial reasoning. Despite effort to compress the pre-materialized subspace skylines through removal of redundancy, the storage space for the first approach remain exponential in the number of dimensions. The query time for the second approach on the other hand also grow substantially for data with higher dimensionality where the pruning power of anchors become much weaker. In this paper, we propose methods for answering subspace skyline query on high dimensional data such that both prematerialization storage and query time can be moderated. We propose novel notions of maximal partial-dominating space, maximal partial-dominated space and the maximal equality space between pairs of skyline objects in the full space and use these concepts as the foundation for answering subspace skyline queries for high dimensional data. Query processing involves mostly simple pruning operations while skyline computation is done only on a small subset of candidate skyline points in the subspace. We also develop a random sampling method to compute the subspace skyline in an on-line fashion. Extensive experiments have been conducted and demonstrated the efficiency and effectiveness of our methods.","PeriodicalId":122925,"journal":{"name":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","volume":"49 2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"34","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"19th International Conference on Scientific and Statistical Database Management (SSDBM 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSDBM.2007.20","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 34

Abstract

Recent studies on efficiently answering subspace skyline queries can be separated into two approaches. The first focused on pre-materializing a set of skylines points in various subspaces while the second focus on dynamically answering the queries by using a set of anchors to prune off skyline points through spatial reasoning. Despite effort to compress the pre-materialized subspace skylines through removal of redundancy, the storage space for the first approach remain exponential in the number of dimensions. The query time for the second approach on the other hand also grow substantially for data with higher dimensionality where the pruning power of anchors become much weaker. In this paper, we propose methods for answering subspace skyline query on high dimensional data such that both prematerialization storage and query time can be moderated. We propose novel notions of maximal partial-dominating space, maximal partial-dominated space and the maximal equality space between pairs of skyline objects in the full space and use these concepts as the foundation for answering subspace skyline queries for high dimensional data. Query processing involves mostly simple pruning operations while skyline computation is done only on a small subset of candidate skyline points in the subspace. We also develop a random sampling method to compute the subspace skyline in an on-line fashion. Extensive experiments have been conducted and demonstrated the efficiency and effectiveness of our methods.
高维数据上子空间Skyline查询的高效处理
最近对子空间天际线查询的有效应答研究可以分为两种方法。第一个侧重于在各个子空间中预实现一组天际线点,而第二个侧重于通过空间推理使用一组锚点来修剪天际线点,从而动态回答查询。尽管努力通过去除冗余来压缩预物化的子空间天际线,但第一种方法的存储空间在维数上仍然是指数级的。另一方面,对于具有更高维度的数据,第二种方法的查询时间也会大大增加,其中锚的修剪能力变得更弱。在本文中,我们提出了在高维数据上回答子空间天际线查询的方法,使得预物化存储和查询时间都可以被调节。我们提出了全空间中天际线对象对之间的最大部分支配空间、最大部分支配空间和最大相等空间的新概念,并将这些概念作为回答高维数据的子空间天际线查询的基础。查询处理主要涉及简单的剪枝操作,而天际线计算只在子空间中候选天际线点的一小部分上进行。我们还开发了一种随机抽样方法,以在线方式计算子空间天际线。大量的实验已经进行,并证明了我们的方法的效率和有效性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信