{"title":"On analyzing the errors in a selectivity estimation method using a multidimensional file structure","authors":"Sang-Wook Kim, Whan-Kyu Whang, K. Whang","doi":"10.1109/CMPSAC.1998.716635","DOIUrl":null,"url":null,"abstract":"In this paper, we discuss the errors in selectivity estimation using the multilevel grid file (MLGF), a multidimensional file structure. We first analyze the cause of the estimation errors, and then investigate five factors affecting the accuracy of estimation: (1) the data distribution in a region, (2) the number of records stored in the MLGF, (3) the page size, (4) the query region size, and (5) the level of the MLGF directory. Next, we present through extensive experiments the tendency of estimation errors when the value for each factor changes. The results show that the errors decrease when (1) the distribution of records in a region becomes closer to the uniform one, (2) the number of records in the MLGF increases, (3) the page size decreases, (4) the query region size increases, and (5) the level of the MLGF directory containing data distribution information becomes lower. We define the granule ratio, the core formula representing the basic relationship between the estimation error and the above five factors, and finally examine the change of estimation errors in relation with the change of the granule ratio through experiments. The results indicate that with a specific value for the granule ratio, errors tend to be similar regardless of different values for the five factors.","PeriodicalId":252030,"journal":{"name":"Proceedings. The Twenty-Second Annual International Computer Software and Applications Conference (Compsac '98) (Cat. No.98CB 36241)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. The Twenty-Second Annual International Computer Software and Applications Conference (Compsac '98) (Cat. No.98CB 36241)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CMPSAC.1998.716635","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
In this paper, we discuss the errors in selectivity estimation using the multilevel grid file (MLGF), a multidimensional file structure. We first analyze the cause of the estimation errors, and then investigate five factors affecting the accuracy of estimation: (1) the data distribution in a region, (2) the number of records stored in the MLGF, (3) the page size, (4) the query region size, and (5) the level of the MLGF directory. Next, we present through extensive experiments the tendency of estimation errors when the value for each factor changes. The results show that the errors decrease when (1) the distribution of records in a region becomes closer to the uniform one, (2) the number of records in the MLGF increases, (3) the page size decreases, (4) the query region size increases, and (5) the level of the MLGF directory containing data distribution information becomes lower. We define the granule ratio, the core formula representing the basic relationship between the estimation error and the above five factors, and finally examine the change of estimation errors in relation with the change of the granule ratio through experiments. The results indicate that with a specific value for the granule ratio, errors tend to be similar regardless of different values for the five factors.