{"title":"在线聚合的大样本和确定性置信区间","authors":"P. Haas","doi":"10.1109/SSDM.1997.621151","DOIUrl":null,"url":null,"abstract":"The online aggregation system recently proposed by J.M. Hellerstein, et al. (1997) permits interactive exploration of large, complex datasets stored in relational database management systems. Running confidence intervals are an important component of an online aggregation system and indicate to the user the estimated proximity of each running aggregate to the corresponding final result. Large sample confidence intervals contain the final result with a prespecified probability and rest on central limit theorems, while deterministic confidence intervals contain the final query result with probability 1. We show how new and existing central limit theorems, simple bounding arguments, and the delta method can be used to derive formulas for both large sample and deterministic confidence intervals. To illustrate these techniques, we obtain formulas for running confidence intervals in the case of single table and multi table AVG, COUNT, SUM, VARIANCE, and STDEV queries with join and selection predicates. Duplicate elimination and GROUP-BY operations are also considered. We then provide numerically stable algorithms for computing the confidence intervals and analyzing the complexity of these algorithms.","PeriodicalId":159935,"journal":{"name":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","volume":"89 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1997-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"134","resultStr":"{\"title\":\"Large-sample and deterministic confidence intervals for online aggregation\",\"authors\":\"P. Haas\",\"doi\":\"10.1109/SSDM.1997.621151\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The online aggregation system recently proposed by J.M. Hellerstein, et al. (1997) permits interactive exploration of large, complex datasets stored in relational database management systems. Running confidence intervals are an important component of an online aggregation system and indicate to the user the estimated proximity of each running aggregate to the corresponding final result. Large sample confidence intervals contain the final result with a prespecified probability and rest on central limit theorems, while deterministic confidence intervals contain the final query result with probability 1. We show how new and existing central limit theorems, simple bounding arguments, and the delta method can be used to derive formulas for both large sample and deterministic confidence intervals. To illustrate these techniques, we obtain formulas for running confidence intervals in the case of single table and multi table AVG, COUNT, SUM, VARIANCE, and STDEV queries with join and selection predicates. Duplicate elimination and GROUP-BY operations are also considered. We then provide numerically stable algorithms for computing the confidence intervals and analyzing the complexity of these algorithms.\",\"PeriodicalId\":159935,\"journal\":{\"name\":\"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)\",\"volume\":\"89 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1997-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"134\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/SSDM.1997.621151\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings. Ninth International Conference on Scientific and Statistical Database Management (Cat. No.97TB100150)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/SSDM.1997.621151","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Large-sample and deterministic confidence intervals for online aggregation
The online aggregation system recently proposed by J.M. Hellerstein, et al. (1997) permits interactive exploration of large, complex datasets stored in relational database management systems. Running confidence intervals are an important component of an online aggregation system and indicate to the user the estimated proximity of each running aggregate to the corresponding final result. Large sample confidence intervals contain the final result with a prespecified probability and rest on central limit theorems, while deterministic confidence intervals contain the final query result with probability 1. We show how new and existing central limit theorems, simple bounding arguments, and the delta method can be used to derive formulas for both large sample and deterministic confidence intervals. To illustrate these techniques, we obtain formulas for running confidence intervals in the case of single table and multi table AVG, COUNT, SUM, VARIANCE, and STDEV queries with join and selection predicates. Duplicate elimination and GROUP-BY operations are also considered. We then provide numerically stable algorithms for computing the confidence intervals and analyzing the complexity of these algorithms.