{"title":"关系树连接的选择性估计","authors":"Chao Zhang, Jiaheng Lu","doi":"10.1145/3400903.3400921","DOIUrl":null,"url":null,"abstract":"Estimating the join selectivity is a crucial problem in many aspects of query processing, such as query optimization and query refinement. Selectivity estimation has been extensively studied for the relational joins in SQL queries and structural joins in path-oriented queries. However, as leading databases have supported the multi-model data management on relational and tree-structured data together, a new problem has arisen: the existing estimation techniques mainly work for a single model but not for the heterogeneous situation due to the cross-model joins. A straightforward combination of existing estimators cannot provide a satisfactory estimation quality. This paper studies the problem of selectivity estimation for cross-model joins with relational and tree-structured data. Our estimator is based on the Kernel Density Estimation (KDE) model, which is a statistical approach using a data sample to approximate multivariate probability distribution. KDE has been successfully applied in relational databases to estimate the selectivity of range and join query. In this work, we propose an estimation method called location-value estimation (LVE) model based on KDE, which considers both value joins and structural joins in relational and tree-structured data. To boost the estimation efficiency in large data samples, we further propose the max-min approximation (MMA) and grid-based approximation (GBA) models to approximate the KDE contribution. Extensive experiments on four real and synthetic datasets demonstrate the effectiveness, efficiency, and scalability of our techniques.","PeriodicalId":334018,"journal":{"name":"32nd International Conference on Scientific and Statistical Database Management","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Selectivity Estimation for Relation-Tree Joins\",\"authors\":\"Chao Zhang, Jiaheng Lu\",\"doi\":\"10.1145/3400903.3400921\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Estimating the join selectivity is a crucial problem in many aspects of query processing, such as query optimization and query refinement. Selectivity estimation has been extensively studied for the relational joins in SQL queries and structural joins in path-oriented queries. However, as leading databases have supported the multi-model data management on relational and tree-structured data together, a new problem has arisen: the existing estimation techniques mainly work for a single model but not for the heterogeneous situation due to the cross-model joins. A straightforward combination of existing estimators cannot provide a satisfactory estimation quality. This paper studies the problem of selectivity estimation for cross-model joins with relational and tree-structured data. Our estimator is based on the Kernel Density Estimation (KDE) model, which is a statistical approach using a data sample to approximate multivariate probability distribution. KDE has been successfully applied in relational databases to estimate the selectivity of range and join query. In this work, we propose an estimation method called location-value estimation (LVE) model based on KDE, which considers both value joins and structural joins in relational and tree-structured data. To boost the estimation efficiency in large data samples, we further propose the max-min approximation (MMA) and grid-based approximation (GBA) models to approximate the KDE contribution. Extensive experiments on four real and synthetic datasets demonstrate the effectiveness, efficiency, and scalability of our techniques.\",\"PeriodicalId\":334018,\"journal\":{\"name\":\"32nd International Conference on Scientific and Statistical Database Management\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-05-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"32nd International Conference on Scientific and Statistical Database Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3400903.3400921\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"32nd International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3400903.3400921","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Estimating the join selectivity is a crucial problem in many aspects of query processing, such as query optimization and query refinement. Selectivity estimation has been extensively studied for the relational joins in SQL queries and structural joins in path-oriented queries. However, as leading databases have supported the multi-model data management on relational and tree-structured data together, a new problem has arisen: the existing estimation techniques mainly work for a single model but not for the heterogeneous situation due to the cross-model joins. A straightforward combination of existing estimators cannot provide a satisfactory estimation quality. This paper studies the problem of selectivity estimation for cross-model joins with relational and tree-structured data. Our estimator is based on the Kernel Density Estimation (KDE) model, which is a statistical approach using a data sample to approximate multivariate probability distribution. KDE has been successfully applied in relational databases to estimate the selectivity of range and join query. In this work, we propose an estimation method called location-value estimation (LVE) model based on KDE, which considers both value joins and structural joins in relational and tree-structured data. To boost the estimation efficiency in large data samples, we further propose the max-min approximation (MMA) and grid-based approximation (GBA) models to approximate the KDE contribution. Extensive experiments on four real and synthetic datasets demonstrate the effectiveness, efficiency, and scalability of our techniques.