关于查询大数据的规模独立性

Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems Pub Date : 2014-06-18 DOI:10.1145/2594538.2594551

W. Fan, Floris Geerts, L. Libkin

{"title":"关于查询大数据的规模独立性","authors":"W. Fan, Floris Geerts, L. Libkin","doi":"10.1145/2594538.2594551","DOIUrl":null,"url":null,"abstract":"To make query answering feasible in big datasets, practitioners have been looking into the notion of scale independence of queries. Intuitively, such queries require only a relatively small subset of the data, whose size is determined by the query and access methods rather than the size of the dataset itself. This paper aims to formalize this notion and study its properties. We start by defining what it means to be scale-independent, and provide matching upper and lower bounds for checking scale independence, for queries in various languages, and for combined and data complexity. Since the complexity turns out to be rather high, and since scale-independent queries cannot be captured syntactically, we develop sufficient conditions for scale independence. We formulate them based on access schemas, which combine indexing and constraints together with bounds on the sizes of retrieved data sets. We then study two variations of scale-independent query answering, inspired by existing practical systems. One concerns incremental query answering: we check when query answers can be maintained in response to updates scale-independently. The other explores scale-independent query rewriting using views.","PeriodicalId":302451,"journal":{"name":"Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems","volume":"35 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"54","resultStr":"{\"title\":\"On scale independence for querying big data\",\"authors\":\"W. Fan, Floris Geerts, L. Libkin\",\"doi\":\"10.1145/2594538.2594551\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"To make query answering feasible in big datasets, practitioners have been looking into the notion of scale independence of queries. Intuitively, such queries require only a relatively small subset of the data, whose size is determined by the query and access methods rather than the size of the dataset itself. This paper aims to formalize this notion and study its properties. We start by defining what it means to be scale-independent, and provide matching upper and lower bounds for checking scale independence, for queries in various languages, and for combined and data complexity. Since the complexity turns out to be rather high, and since scale-independent queries cannot be captured syntactically, we develop sufficient conditions for scale independence. We formulate them based on access schemas, which combine indexing and constraints together with bounds on the sizes of retrieved data sets. We then study two variations of scale-independent query answering, inspired by existing practical systems. One concerns incremental query answering: we check when query answers can be maintained in response to updates scale-independently. The other explores scale-independent query rewriting using views.\",\"PeriodicalId\":302451,\"journal\":{\"name\":\"Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems\",\"volume\":\"35 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-06-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"54\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2594538.2594551\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2594538.2594551","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 54

摘要

为了使查询回答在大数据集中可行，从业者一直在研究查询的规模独立的概念。直观地说，这样的查询只需要相对较小的数据子集，其大小由查询和访问方法决定，而不是数据集本身的大小。本文旨在形式化这一概念并研究其性质。我们首先定义scale-independent的含义，并为检查scale独立性、各种语言的查询以及组合和数据复杂性提供匹配的上限和下限。由于复杂性变得相当高，并且由于无法在语法上捕获与规模无关的查询，因此我们为规模无关开发了充分的条件。我们基于访问模式制定它们，访问模式将索引和约束与检索数据集的大小界限结合在一起。然后，受现有实际系统的启发，我们研究了尺度无关查询应答的两种变体。其中一个涉及增量查询回答:我们检查查询答案何时可以维护以响应更新而独立于规模。另一个则探讨了使用视图进行与规模无关的查询重写。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

On scale independence for querying big data

To make query answering feasible in big datasets, practitioners have been looking into the notion of scale independence of queries. Intuitively, such queries require only a relatively small subset of the data, whose size is determined by the query and access methods rather than the size of the dataset itself. This paper aims to formalize this notion and study its properties. We start by defining what it means to be scale-independent, and provide matching upper and lower bounds for checking scale independence, for queries in various languages, and for combined and data complexity. Since the complexity turns out to be rather high, and since scale-independent queries cannot be captured syntactically, we develop sufficient conditions for scale independence. We formulate them based on access schemas, which combine indexing and constraints together with bounds on the sizes of retrieved data sets. We then study two variations of scale-independent query answering, inspired by existing practical systems. One concerns incremental query answering: we check when query answers can be maintained in response to updates scale-independently. The other explores scale-independent query rewriting using views.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

自引率

0.00%

发文量