{"title":"子空间探索:投影频率估计的边界。","authors":"Graham Cormode, Charlie Dickens, David P Woodruff","doi":"10.1145/3452021.3458312","DOIUrl":null,"url":null,"abstract":"<p><p>Given an <i>n</i> × <i>d</i> dimensional dataset <i>A</i>, a projection query specifies a subset <i>C</i> ⊆ [<i>d</i>] of columns which yields a new <i>n</i> × |<i>C</i>| array. We study the space complexity of computing data analysis functions over such subspaces, including heavy hitters and norms, when the subspaces are revealed only after observing the data. We show that this important class of problems is typically hard: for many problems, we show 2<sup>Ω(<i>d</i>)</sup> lower bounds. However, we present upper bounds which demonstrate space dependency better than 2 <i><sup>d</sup></i> . That is, for <i>c, c</i>' ∈ (0, 1) and a parameter <i>N</i> = 2 <i><sup>d</sup></i> an <i>N<sup>c</sup></i> -approximation can be obtained in space <math><mrow><mi>min</mi> <mrow><mo>(</mo> <mrow><msup><mi>N</mi> <mrow><msup><mi>c</mi> <mo>'</mo></msup> </mrow> </msup> <mo>,</mo> <mi>n</mi></mrow> <mo>)</mo></mrow> </mrow> </math> , showing that it is possible to improve on the naïve approach of keeping information for all 2 <i><sup>d</sup></i> subsets of <i>d</i> columns. Our results are based on careful constructions of instances using coding theory and novel combinatorial reductions that exhibit such space-approximation tradeoffs.</p>","PeriodicalId":92118,"journal":{"name":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","volume":"2021 ","pages":"273-284"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1145/3452021.3458312","citationCount":"1","resultStr":"{\"title\":\"Subspace exploration: Bounds on Projected Frequency Estimation.\",\"authors\":\"Graham Cormode, Charlie Dickens, David P Woodruff\",\"doi\":\"10.1145/3452021.3458312\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Given an <i>n</i> × <i>d</i> dimensional dataset <i>A</i>, a projection query specifies a subset <i>C</i> ⊆ [<i>d</i>] of columns which yields a new <i>n</i> × |<i>C</i>| array. We study the space complexity of computing data analysis functions over such subspaces, including heavy hitters and norms, when the subspaces are revealed only after observing the data. We show that this important class of problems is typically hard: for many problems, we show 2<sup>Ω(<i>d</i>)</sup> lower bounds. However, we present upper bounds which demonstrate space dependency better than 2 <i><sup>d</sup></i> . That is, for <i>c, c</i>' ∈ (0, 1) and a parameter <i>N</i> = 2 <i><sup>d</sup></i> an <i>N<sup>c</sup></i> -approximation can be obtained in space <math><mrow><mi>min</mi> <mrow><mo>(</mo> <mrow><msup><mi>N</mi> <mrow><msup><mi>c</mi> <mo>'</mo></msup> </mrow> </msup> <mo>,</mo> <mi>n</mi></mrow> <mo>)</mo></mrow> </mrow> </math> , showing that it is possible to improve on the naïve approach of keeping information for all 2 <i><sup>d</sup></i> subsets of <i>d</i> columns. Our results are based on careful constructions of instances using coding theory and novel combinatorial reductions that exhibit such space-approximation tradeoffs.</p>\",\"PeriodicalId\":92118,\"journal\":{\"name\":\"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems\",\"volume\":\"2021 \",\"pages\":\"273-284\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1145/3452021.3458312\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3452021.3458312\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2021/6/20 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems. ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3452021.3458312","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2021/6/20 0:00:00","PubModel":"Epub","JCR":"","JCRName":"","Score":null,"Total":0}
Subspace exploration: Bounds on Projected Frequency Estimation.
Given an n × d dimensional dataset A, a projection query specifies a subset C ⊆ [d] of columns which yields a new n × |C| array. We study the space complexity of computing data analysis functions over such subspaces, including heavy hitters and norms, when the subspaces are revealed only after observing the data. We show that this important class of problems is typically hard: for many problems, we show 2Ω(d) lower bounds. However, we present upper bounds which demonstrate space dependency better than 2 d . That is, for c, c' ∈ (0, 1) and a parameter N = 2 d an Nc -approximation can be obtained in space , showing that it is possible to improve on the naïve approach of keeping information for all 2 d subsets of d columns. Our results are based on careful constructions of instances using coding theory and novel combinatorial reductions that exhibit such space-approximation tradeoffs.