On analyzing the errors in a selectivity estimation method using a multidimensional file structure

Sang-Wook Kim, Whan-Kyu Whang, K. Whang
{"title":"On analyzing the errors in a selectivity estimation method using a multidimensional file structure","authors":"Sang-Wook Kim, Whan-Kyu Whang, K. Whang","doi":"10.1109/CMPSAC.1998.716635","DOIUrl":null,"url":null,"abstract":"In this paper, we discuss the errors in selectivity estimation using the multilevel grid file (MLGF), a multidimensional file structure. We first analyze the cause of the estimation errors, and then investigate five factors affecting the accuracy of estimation: (1) the data distribution in a region, (2) the number of records stored in the MLGF, (3) the page size, (4) the query region size, and (5) the level of the MLGF directory. Next, we present through extensive experiments the tendency of estimation errors when the value for each factor changes. The results show that the errors decrease when (1) the distribution of records in a region becomes closer to the uniform one, (2) the number of records in the MLGF increases, (3) the page size decreases, (4) the query region size increases, and (5) the level of the MLGF directory containing data distribution information becomes lower. We define the granule ratio, the core formula representing the basic relationship between the estimation error and the above five factors, and finally examine the change of estimation errors in relation with the change of the granule ratio through experiments. The results indicate that with a specific value for the granule ratio, errors tend to be similar regardless of different values for the five factors.","PeriodicalId":252030,"journal":{"name":"Proceedings. The Twenty-Second Annual International Computer Software and Applications Conference (Compsac '98) (Cat. No.98CB 36241)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1998-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. The Twenty-Second Annual International Computer Software and Applications Conference (Compsac '98) (Cat. No.98CB 36241)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CMPSAC.1998.716635","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

In this paper, we discuss the errors in selectivity estimation using the multilevel grid file (MLGF), a multidimensional file structure. We first analyze the cause of the estimation errors, and then investigate five factors affecting the accuracy of estimation: (1) the data distribution in a region, (2) the number of records stored in the MLGF, (3) the page size, (4) the query region size, and (5) the level of the MLGF directory. Next, we present through extensive experiments the tendency of estimation errors when the value for each factor changes. The results show that the errors decrease when (1) the distribution of records in a region becomes closer to the uniform one, (2) the number of records in the MLGF increases, (3) the page size decreases, (4) the query region size increases, and (5) the level of the MLGF directory containing data distribution information becomes lower. We define the granule ratio, the core formula representing the basic relationship between the estimation error and the above five factors, and finally examine the change of estimation errors in relation with the change of the granule ratio through experiments. The results indicate that with a specific value for the granule ratio, errors tend to be similar regardless of different values for the five factors.
基于多维文件结构的选择性估计方法误差分析
本文讨论了多层网格文件(MLGF)这种多维文件结构在选择性估计中的误差。我们首先分析了估计误差的原因,然后研究了影响估计精度的五个因素:(1)区域内的数据分布,(2)MLGF中存储的记录数量,(3)页面大小,(4)查询区域大小,(5)MLGF目录级别。其次,我们通过大量的实验,提出了估计误差的趋势时,每个因素的值的变化。结果表明,当(1)区域内的记录分布趋于均匀,(2)MLGF中的记录数量增加,(3)页面大小减小,(4)查询区域大小增加,以及(5)包含数据分布信息的MLGF目录级别降低时,错误减少。我们定义了代表估计误差与上述五个因素基本关系的核心公式——颗粒比,最后通过实验考察了估计误差随颗粒比变化的关系。结果表明,在颗粒比一定的情况下,无论这五个因素的数值不同,误差都趋于相似。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信