A Learned Query Optimizer for Spatial Join

Tin Vu, A. Belussi, S. Migliorini, A. Eldawy
{"title":"A Learned Query Optimizer for Spatial Join","authors":"Tin Vu, A. Belussi, S. Migliorini, A. Eldawy","doi":"10.1145/3474717.3484217","DOIUrl":null,"url":null,"abstract":"The importance and complexity of spatial join resulted in many join algorithms, some of which run on big-data platforms such as Hadoop and Spark. This paper proposes the first machine-learning-based query optimizer for spatial join operation which can accommodate the skewness of the spatial datasets and the complexity of the different algorithms. The main challenge is how to develop portable cost models that take into account the important input characteristics such as data distribution, spatial partitioning, logic of spatial join algorithms, and the relationship between the two datasets. The proposed system defines a set of features that can all be computed efficiently for the data to catch the intricate aspects of spatial join. Then, it uses these features to train three machine learning models that capture several metrics to estimate the cost of four spatial join algorithms according to user requirements. The first model can estimate the cardinality of spatial join algorithm. The second model can predict the number of rough comparisons for a specific join algorithm. Finally, the third model is a classification model that can choose the best join algorithm to run. Experiments on large scale synthetic and real data show the efficiency of the proposed models over baseline methods.","PeriodicalId":340759,"journal":{"name":"Proceedings of the 29th International Conference on Advances in Geographic Information Systems","volume":"58 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"6","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 29th International Conference on Advances in Geographic Information Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3474717.3484217","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 6

Abstract

The importance and complexity of spatial join resulted in many join algorithms, some of which run on big-data platforms such as Hadoop and Spark. This paper proposes the first machine-learning-based query optimizer for spatial join operation which can accommodate the skewness of the spatial datasets and the complexity of the different algorithms. The main challenge is how to develop portable cost models that take into account the important input characteristics such as data distribution, spatial partitioning, logic of spatial join algorithms, and the relationship between the two datasets. The proposed system defines a set of features that can all be computed efficiently for the data to catch the intricate aspects of spatial join. Then, it uses these features to train three machine learning models that capture several metrics to estimate the cost of four spatial join algorithms according to user requirements. The first model can estimate the cardinality of spatial join algorithm. The second model can predict the number of rough comparisons for a specific join algorithm. Finally, the third model is a classification model that can choose the best join algorithm to run. Experiments on large scale synthetic and real data show the efficiency of the proposed models over baseline methods.
空间连接的学习查询优化器
空间连接的重要性和复杂性导致了许多连接算法,其中一些算法运行在Hadoop和Spark等大数据平台上。本文提出了第一个基于机器学习的空间连接查询优化器,该优化器可以适应空间数据集的偏性和不同算法的复杂性。主要的挑战是如何开发可移植的成本模型,考虑到重要的输入特征,如数据分布、空间分区、空间连接算法的逻辑以及两个数据集之间的关系。提出的系统定义了一组特征,这些特征可以有效地计算数据,以捕获空间连接的复杂方面。然后,它使用这些特征来训练三个机器学习模型,这些模型捕获几个指标,根据用户需求估计四种空间连接算法的成本。第一个模型可以估计空间连接算法的基数。第二个模型可以预测特定连接算法的粗略比较次数。最后,第三个模型是一个分类模型,它可以选择最好的连接算法来运行。在大规模合成数据和实际数据上的实验表明,该模型比基线方法更有效。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信