{"title":"Selectivity estimation for string predicates: overcoming the underestimation problem","authors":"S. Chaudhuri, Venkatesh Ganti, L. Gravano","doi":"10.1109/ICDE.2004.1319999","DOIUrl":null,"url":null,"abstract":"Queries with (equality or LIKE) selection predicates over string attributes are widely used in relational databases. However, state-of-the-art techniques for estimating selectivities of string predicates are often biased towards severely underestimating selectivities. We develop accurate selectivity estimators for string predicates that adapt to data and query characteristics, and which can exploit and build on a variety of existing estimators. A thorough experimental evaluation over real data sets demonstrates the resilience of our estimators to variations in both data and query characteristics.","PeriodicalId":358862,"journal":{"name":"Proceedings. 20th International Conference on Data Engineering","volume":"2 12","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2004-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"68","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. 20th International Conference on Data Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE.2004.1319999","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 68
Abstract
Queries with (equality or LIKE) selection predicates over string attributes are widely used in relational databases. However, state-of-the-art techniques for estimating selectivities of string predicates are often biased towards severely underestimating selectivities. We develop accurate selectivity estimators for string predicates that adapt to data and query characteristics, and which can exploit and build on a variety of existing estimators. A thorough experimental evaluation over real data sets demonstrates the resilience of our estimators to variations in both data and query characteristics.