Performance of KDB-trees with query-based splitting

Proceedings. International Conference on Information Technology: Coding and Computing Pub Date : 2002-04-08 DOI:10.1109/ITCC.2002.1000390

Yves Lépouchard, J. Pfaltz, R. Orlandic

{"title":"Performance of KDB-trees with query-based splitting","authors":"Yves Lépouchard, J. Pfaltz, R. Orlandic","doi":"10.1109/ITCC.2002.1000390","DOIUrl":null,"url":null,"abstract":"While the persistent data of many advanced database applications, such as OLAP and scientific studies, are characterized by very high dimensionality, typical queries posed on these data appeal to a small number of relevant dimensions. Unfortunately, the multidimensional access methods designed for high-dimensional data perform rather poorly for these partially specified queries. A potentially very appealing idea, frequently suggested in the literature, is to adopt a node-splitting policy that takes into account the \"importance\" of individual dimensions, which could be determined either a priori or through a statistical sampling of actual queries. This paper presents the results of some carefully controlled experiments conducted to observe the effects of query-based splitting on the performance of KDB-trees. The strategy is compared to a splitting policy that selects the split dimensions in a \"cyclic\" fashion, which has been shown to be very effective, especially in high-dimensional situations. Based on the results, the query-based splitting does not appear to be a very appealing splitting strategy for KDB-trees.","PeriodicalId":115190,"journal":{"name":"Proceedings. International Conference on Information Technology: Coding and Computing","volume":"29 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2002-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. International Conference on Information Technology: Coding and Computing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ITCC.2002.1000390","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

While the persistent data of many advanced database applications, such as OLAP and scientific studies, are characterized by very high dimensionality, typical queries posed on these data appeal to a small number of relevant dimensions. Unfortunately, the multidimensional access methods designed for high-dimensional data perform rather poorly for these partially specified queries. A potentially very appealing idea, frequently suggested in the literature, is to adopt a node-splitting policy that takes into account the "importance" of individual dimensions, which could be determined either a priori or through a statistical sampling of actual queries. This paper presents the results of some carefully controlled experiments conducted to observe the effects of query-based splitting on the performance of KDB-trees. The strategy is compared to a splitting policy that selects the split dimensions in a "cyclic" fashion, which has been shown to be very effective, especially in high-dimensional situations. Based on the results, the query-based splitting does not appear to be a very appealing splitting strategy for KDB-trees.

查看原文本刊更多论文

基于查询拆分的kdb树的性能

虽然许多高级数据库应用程序(如OLAP和科学研究)的持久数据具有非常高的维度，但对这些数据提出的典型查询只涉及少量相关维度。不幸的是，对于这些部分指定的查询，为高维数据设计的多维访问方法的性能相当差。在文献中经常提出的一个潜在的非常吸引人的想法是，采用考虑到单个维度的“重要性”的节点分割策略，这可以通过先验或通过实际查询的统计抽样来确定。本文介绍了一些精心控制的实验的结果，这些实验是为了观察基于查询的拆分对kdb树性能的影响。将该策略与以“循环”方式选择分割维度的分割策略进行比较，后者已被证明非常有效，特别是在高维情况下。根据结果，基于查询的分割似乎不是一个非常吸引人的kdb树分割策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings. International Conference on Information Technology: Coding and Computing

自引率

0.00%

发文量