现代硬件上的多维范围查询

Proceedings of the 30th International Conference on Scientific and Statistical Database Management Pub Date : 2018-01-11 DOI:10.1145/3221269.3223031

Stefan Sprenger, Patrick Schäfer, U. Leser

{"title":"现代硬件上的多维范围查询","authors":"Stefan Sprenger, Patrick Schäfer, U. Leser","doi":"10.1145/3221269.3223031","DOIUrl":null,"url":null,"abstract":"Range queries over multidimensional data are an important part of database workloads in many applications. Their execution may be accelerated by using multidimensional index structures (MDIS), such as kd-trees or R-trees. As for most index structures, the usefulness of this approach depends on the selectivity of the queries, and common wisdom told that a simple scan beats MDIS for queries accessing more than 15%-20% of a dataset. However, this wisdom is largely based on evaluations that are almost two decades old, performed on data being held on disks, applying IO-optimized data structures, and using single-core systems. The question is whether this rule of thumb still holds when multidimensional range queries (MDRQ) are performed on modern architectures with large main memories holding all data, multi-core CPUs and data-parallel instruction sets. In this paper, we study the question whether and how much modern hardware influences the performance ratio between index structures and scans for MDRQ. To this end, we conservatively adapted three popular MDIS, namely the R*-tree, the kd-tree, and the VA-file, to exploit features of modern servers and compared their performance to different flavors of parallel scans using multiple (synthetic and real-world) analytical workloads over multiple (synthetic and real-world) datasets of varying size, dimensionality, and skew. We find that all approaches benefit considerably from using main memory and parallelization, yet to varying degrees. Our evaluation indicates that, on current machines, scanning should be favored over parallel versions of classical MDIS even for very selective queries.","PeriodicalId":365491,"journal":{"name":"Proceedings of the 30th International Conference on Scientific and Statistical Database Management","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-01-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":"{\"title\":\"Multidimensional range queries on modern hardware\",\"authors\":\"Stefan Sprenger, Patrick Schäfer, U. Leser\",\"doi\":\"10.1145/3221269.3223031\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Range queries over multidimensional data are an important part of database workloads in many applications. Their execution may be accelerated by using multidimensional index structures (MDIS), such as kd-trees or R-trees. As for most index structures, the usefulness of this approach depends on the selectivity of the queries, and common wisdom told that a simple scan beats MDIS for queries accessing more than 15%-20% of a dataset. However, this wisdom is largely based on evaluations that are almost two decades old, performed on data being held on disks, applying IO-optimized data structures, and using single-core systems. The question is whether this rule of thumb still holds when multidimensional range queries (MDRQ) are performed on modern architectures with large main memories holding all data, multi-core CPUs and data-parallel instruction sets. In this paper, we study the question whether and how much modern hardware influences the performance ratio between index structures and scans for MDRQ. To this end, we conservatively adapted three popular MDIS, namely the R*-tree, the kd-tree, and the VA-file, to exploit features of modern servers and compared their performance to different flavors of parallel scans using multiple (synthetic and real-world) analytical workloads over multiple (synthetic and real-world) datasets of varying size, dimensionality, and skew. We find that all approaches benefit considerably from using main memory and parallelization, yet to varying degrees. Our evaluation indicates that, on current machines, scanning should be favored over parallel versions of classical MDIS even for very selective queries.\",\"PeriodicalId\":365491,\"journal\":{\"name\":\"Proceedings of the 30th International Conference on Scientific and Statistical Database Management\",\"volume\":\"44 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-01-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"13\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 30th International Conference on Scientific and Statistical Database Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3221269.3223031\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 30th International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3221269.3223031","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

摘要

在许多应用程序中，多维数据的范围查询是数据库工作负载的重要组成部分。它们的执行可以通过使用多维索引结构(MDIS)来加速，例如kd-树或r -树。对于大多数索引结构，这种方法的有用性取决于查询的选择性，一般认为，对于访问数据集15%-20%以上的查询，简单的扫描优于MDIS。然而，这种智慧在很大程度上是基于近20年前的评估，这些评估是对保存在磁盘上的数据执行的，应用io优化的数据结构，并使用单核系统。问题是，当多维范围查询(MDRQ)在大型主存、多核cpu和数据并行指令集的现代架构上执行时，这条经验法则是否仍然成立。在本文中，我们研究了现代硬件是否以及在多大程度上影响索引结构和MDRQ扫描之间的性能比率。为此，我们保守地采用了三种流行的MDIS，即R*树、kd树和va文件，以利用现代服务器的特性，并将它们的性能与使用多个(合成的和真实的)分析工作负载对多个(合成的和真实的)不同大小、维度和倾斜的数据集进行的不同类型的并行扫描进行比较。我们发现所有的方法都从使用主存和并行化中获益，只是程度不同。我们的评估表明，在当前的机器上，即使对于非常有选择性的查询，扫描也应该比并行版本的经典MDIS更受青睐。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Multidimensional range queries on modern hardware

Range queries over multidimensional data are an important part of database workloads in many applications. Their execution may be accelerated by using multidimensional index structures (MDIS), such as kd-trees or R-trees. As for most index structures, the usefulness of this approach depends on the selectivity of the queries, and common wisdom told that a simple scan beats MDIS for queries accessing more than 15%-20% of a dataset. However, this wisdom is largely based on evaluations that are almost two decades old, performed on data being held on disks, applying IO-optimized data structures, and using single-core systems. The question is whether this rule of thumb still holds when multidimensional range queries (MDRQ) are performed on modern architectures with large main memories holding all data, multi-core CPUs and data-parallel instruction sets. In this paper, we study the question whether and how much modern hardware influences the performance ratio between index structures and scans for MDRQ. To this end, we conservatively adapted three popular MDIS, namely the R*-tree, the kd-tree, and the VA-file, to exploit features of modern servers and compared their performance to different flavors of parallel scans using multiple (synthetic and real-world) analytical workloads over multiple (synthetic and real-world) datasets of varying size, dimensionality, and skew. We find that all approaches benefit considerably from using main memory and parallelization, yet to varying degrees. Our evaluation indicates that, on current machines, scanning should be favored over parallel versions of classical MDIS even for very selective queries.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 30th International Conference on Scientific and Statistical Database Management

自引率

0.00%

发文量