{"title":"Parallel online aggregation in action","authors":"Chengjie Qin, Florin Rusu","doi":"10.1145/2484838.2484874","DOIUrl":null,"url":null,"abstract":"Online aggregation provides continuous estimates to the final result of a computation during the actual processing. The user can stop the computation as soon as the estimate is accurate enough, typically early in the execution, or can let the processing terminate and obtain the exact result. In this demonstration, we introduce a general framework for parallel online aggregation in which estimation does not incur overhead on top of the actual processing. We define a generic interface to express any estimation model that abstracts completely the execution details. We design multiple sampling-based estimators suited for parallel online aggregation and implement them inside the framework. Demonstration participants are shown how estimates to general SQL aggregation queries over terabytes of TPC-H data are generated during the entire processing. Due to parallel execution, the estimate converges to the correct result in a matter of seconds even for the most difficult queries. The behavior of the estimators is evaluated under different operating regimes of the distributed cluster used in the demonstration.","PeriodicalId":74773,"journal":{"name":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","volume":"27 1","pages":"46:1-46:4"},"PeriodicalIF":0.0000,"publicationDate":"2013-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"19","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Scientific and statistical database management : International Conference, SSDBM ... : proceedings. International Conference on Scientific and Statistical Database Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2484838.2484874","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 19
Abstract
Online aggregation provides continuous estimates to the final result of a computation during the actual processing. The user can stop the computation as soon as the estimate is accurate enough, typically early in the execution, or can let the processing terminate and obtain the exact result. In this demonstration, we introduce a general framework for parallel online aggregation in which estimation does not incur overhead on top of the actual processing. We define a generic interface to express any estimation model that abstracts completely the execution details. We design multiple sampling-based estimators suited for parallel online aggregation and implement them inside the framework. Demonstration participants are shown how estimates to general SQL aggregation queries over terabytes of TPC-H data are generated during the entire processing. Due to parallel execution, the estimate converges to the correct result in a matter of seconds even for the most difficult queries. The behavior of the estimators is evaluated under different operating regimes of the distributed cluster used in the demonstration.