{"title":"深层网络数据源低选择性查询的有效分层","authors":"Fan Wang, G. Agrawal","doi":"10.1145/2063576.2063786","DOIUrl":null,"url":null,"abstract":"We study the problem of estimating the result of an aggregation query with low selectivity when a data source only supports limited data accesses. Existing stratified sampling techniques cannot be applied to such a problem since either it is very hard, if not impossible, to gather certain critical statistics from such a data source, or more importantly, the selective attribute of the query may not be queriable on the data source. In such cases, we need an effective mechanism to stratify the data and form homogeneous strata with respect to the selective attribute of the query, despite not being able to query the data source with the selective attribute.\n This paper presents and evaluates a stratification method for this problem utilizing a queriable auxiliary attribute. The breaking points for the stratification are computed based on a novel Bayesian Adaptive Harmony Search algorithm. This method derives from the existing Harmony search method, but includes novel objective function, and introduces a technique for dynamically adapting key parameters of this method. Our experiments show that the estimation accuracy achieved using our method is consistently higher than 95% even for 0.01% selectivity query, even when there is only a low correlation between the auxiliary attribute and the selective attribute. Furthermore, our method achieves at least a five fold reduction in estimation error over three other methods, for the same sampling cost.","PeriodicalId":74507,"journal":{"name":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","volume":"223 1","pages":"1455-1464"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Effective stratification for low selectivity queries on deep web data sources\",\"authors\":\"Fan Wang, G. Agrawal\",\"doi\":\"10.1145/2063576.2063786\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We study the problem of estimating the result of an aggregation query with low selectivity when a data source only supports limited data accesses. Existing stratified sampling techniques cannot be applied to such a problem since either it is very hard, if not impossible, to gather certain critical statistics from such a data source, or more importantly, the selective attribute of the query may not be queriable on the data source. In such cases, we need an effective mechanism to stratify the data and form homogeneous strata with respect to the selective attribute of the query, despite not being able to query the data source with the selective attribute.\\n This paper presents and evaluates a stratification method for this problem utilizing a queriable auxiliary attribute. The breaking points for the stratification are computed based on a novel Bayesian Adaptive Harmony Search algorithm. This method derives from the existing Harmony search method, but includes novel objective function, and introduces a technique for dynamically adapting key parameters of this method. Our experiments show that the estimation accuracy achieved using our method is consistently higher than 95% even for 0.01% selectivity query, even when there is only a low correlation between the auxiliary attribute and the selective attribute. Furthermore, our method achieves at least a five fold reduction in estimation error over three other methods, for the same sampling cost.\",\"PeriodicalId\":74507,\"journal\":{\"name\":\"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management\",\"volume\":\"223 1\",\"pages\":\"1455-1464\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2063576.2063786\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM International Conference on Information & Knowledge Management. ACM International Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2063576.2063786","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Effective stratification for low selectivity queries on deep web data sources
We study the problem of estimating the result of an aggregation query with low selectivity when a data source only supports limited data accesses. Existing stratified sampling techniques cannot be applied to such a problem since either it is very hard, if not impossible, to gather certain critical statistics from such a data source, or more importantly, the selective attribute of the query may not be queriable on the data source. In such cases, we need an effective mechanism to stratify the data and form homogeneous strata with respect to the selective attribute of the query, despite not being able to query the data source with the selective attribute.
This paper presents and evaluates a stratification method for this problem utilizing a queriable auxiliary attribute. The breaking points for the stratification are computed based on a novel Bayesian Adaptive Harmony Search algorithm. This method derives from the existing Harmony search method, but includes novel objective function, and introduces a technique for dynamically adapting key parameters of this method. Our experiments show that the estimation accuracy achieved using our method is consistently higher than 95% even for 0.01% selectivity query, even when there is only a low correlation between the auxiliary attribute and the selective attribute. Furthermore, our method achieves at least a five fold reduction in estimation error over three other methods, for the same sampling cost.