Kuldeep S. Meel, N. V. Vinodchandran, Sourav Chakraborty
{"title":"流模型中集合并集大小的估计","authors":"Kuldeep S. Meel, N. V. Vinodchandran, Sourav Chakraborty","doi":"10.1145/3452021.3458333","DOIUrl":null,"url":null,"abstract":"In this paper we study the problem of estimating the size of the union of sets $S_1, \\dots, S_M$ where each set $S_i \\subseteq Ømega$ (for some discrete universe $Ømega$) is implicitly presented and comes in a streaming fashion. We define the notion of Delphic sets to capture class of streaming problems where membership, sampling, and counting calls to the sets are efficient. In particular, we show our notion of Delphic sets capture three well known problems: Klee's measure problem (discrete version), test coverage estimation, and model counting of DNF formulas. The Klee's measure problem corresponds to computation of volume of multi-dimension axis aligned rectangles, i.e., every d-dimension axis-aligned rectangle can be defined as $[a_1,b_1] \\times [a_2,b_2] \\times łdots \\times [a_d, b_d]$. The problem of test coverage estimation focuses on the computation of coverage measure for a given testing array in the context of combinatorial testing, which is a fundamental technique in the context of hardware and software testing. Finally, given a DNF formula $\\varphi = T_1 \\vee T_2 \\vee łdots \\vee T_M$, the problem of model counting seeks to compute the number of satisfying assignments of $\\varphi$. The primary contribution of our work is a simple and efficient sampling-based algorithm, called \\hybrid, for estimating the of union of sets in streaming setting. Our algorithm has the space complexity of $O(Rłog |Ømega|)$ and update time is $O(Rłog R \\cdot łog(M/δ) \\cdot łog|Ømega|)$ where, $R = Ołeft(łog (M/δ)\\cdot \\varepsilon^2 \\right).$ Consequently, our algorithm provides the first algorithm with linear dependence on d for Klee's measure problem in streaming setting for $d>1$, thereby settling the open problem of Tirthpura and Woodruff (PODS-12). Furthermore, a straightforward application of our algorithm lends to an efficient algorithm for coverage estimation problem in streaming setting. We then investigate whether the space complexity for coverage estimation can be further improved, and in this context, we present another streaming algorithm that uses near-optimal $O(tłog n/\\varepsilon^2)$ space complexity but uses an update algorithm that is in $\\rm P ^\\rm NP $, thereby showcasing an interesting time vs space trade-off in the streaming setting. Finally, we demonstrate the generality of our Delphic sets by obtaining a streaming algorithm for model counting of DNF formulas. It is worth remarking that we view a key strength of our work is the simplicity of both the algorithm and its theoretical analysis, which makes it amenable to practical implementation and easy adoption.","PeriodicalId":405398,"journal":{"name":"Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-06-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Estimating the Size of Union of Sets in Streaming Models\",\"authors\":\"Kuldeep S. Meel, N. V. Vinodchandran, Sourav Chakraborty\",\"doi\":\"10.1145/3452021.3458333\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper we study the problem of estimating the size of the union of sets $S_1, \\\\dots, S_M$ where each set $S_i \\\\subseteq Ømega$ (for some discrete universe $Ømega$) is implicitly presented and comes in a streaming fashion. We define the notion of Delphic sets to capture class of streaming problems where membership, sampling, and counting calls to the sets are efficient. In particular, we show our notion of Delphic sets capture three well known problems: Klee's measure problem (discrete version), test coverage estimation, and model counting of DNF formulas. The Klee's measure problem corresponds to computation of volume of multi-dimension axis aligned rectangles, i.e., every d-dimension axis-aligned rectangle can be defined as $[a_1,b_1] \\\\times [a_2,b_2] \\\\times łdots \\\\times [a_d, b_d]$. The problem of test coverage estimation focuses on the computation of coverage measure for a given testing array in the context of combinatorial testing, which is a fundamental technique in the context of hardware and software testing. Finally, given a DNF formula $\\\\varphi = T_1 \\\\vee T_2 \\\\vee łdots \\\\vee T_M$, the problem of model counting seeks to compute the number of satisfying assignments of $\\\\varphi$. The primary contribution of our work is a simple and efficient sampling-based algorithm, called \\\\hybrid, for estimating the of union of sets in streaming setting. Our algorithm has the space complexity of $O(Rłog |Ømega|)$ and update time is $O(Rłog R \\\\cdot łog(M/δ) \\\\cdot łog|Ømega|)$ where, $R = Ołeft(łog (M/δ)\\\\cdot \\\\varepsilon^2 \\\\right).$ Consequently, our algorithm provides the first algorithm with linear dependence on d for Klee's measure problem in streaming setting for $d>1$, thereby settling the open problem of Tirthpura and Woodruff (PODS-12). Furthermore, a straightforward application of our algorithm lends to an efficient algorithm for coverage estimation problem in streaming setting. We then investigate whether the space complexity for coverage estimation can be further improved, and in this context, we present another streaming algorithm that uses near-optimal $O(tłog n/\\\\varepsilon^2)$ space complexity but uses an update algorithm that is in $\\\\rm P ^\\\\rm NP $, thereby showcasing an interesting time vs space trade-off in the streaming setting. Finally, we demonstrate the generality of our Delphic sets by obtaining a streaming algorithm for model counting of DNF formulas. It is worth remarking that we view a key strength of our work is the simplicity of both the algorithm and its theoretical analysis, which makes it amenable to practical implementation and easy adoption.\",\"PeriodicalId\":405398,\"journal\":{\"name\":\"Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-06-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3452021.3458333\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3452021.3458333","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Estimating the Size of Union of Sets in Streaming Models
In this paper we study the problem of estimating the size of the union of sets $S_1, \dots, S_M$ where each set $S_i \subseteq Ømega$ (for some discrete universe $Ømega$) is implicitly presented and comes in a streaming fashion. We define the notion of Delphic sets to capture class of streaming problems where membership, sampling, and counting calls to the sets are efficient. In particular, we show our notion of Delphic sets capture three well known problems: Klee's measure problem (discrete version), test coverage estimation, and model counting of DNF formulas. The Klee's measure problem corresponds to computation of volume of multi-dimension axis aligned rectangles, i.e., every d-dimension axis-aligned rectangle can be defined as $[a_1,b_1] \times [a_2,b_2] \times łdots \times [a_d, b_d]$. The problem of test coverage estimation focuses on the computation of coverage measure for a given testing array in the context of combinatorial testing, which is a fundamental technique in the context of hardware and software testing. Finally, given a DNF formula $\varphi = T_1 \vee T_2 \vee łdots \vee T_M$, the problem of model counting seeks to compute the number of satisfying assignments of $\varphi$. The primary contribution of our work is a simple and efficient sampling-based algorithm, called \hybrid, for estimating the of union of sets in streaming setting. Our algorithm has the space complexity of $O(Rłog |Ømega|)$ and update time is $O(Rłog R \cdot łog(M/δ) \cdot łog|Ømega|)$ where, $R = Ołeft(łog (M/δ)\cdot \varepsilon^2 \right).$ Consequently, our algorithm provides the first algorithm with linear dependence on d for Klee's measure problem in streaming setting for $d>1$, thereby settling the open problem of Tirthpura and Woodruff (PODS-12). Furthermore, a straightforward application of our algorithm lends to an efficient algorithm for coverage estimation problem in streaming setting. We then investigate whether the space complexity for coverage estimation can be further improved, and in this context, we present another streaming algorithm that uses near-optimal $O(tłog n/\varepsilon^2)$ space complexity but uses an update algorithm that is in $\rm P ^\rm NP $, thereby showcasing an interesting time vs space trade-off in the streaming setting. Finally, we demonstrate the generality of our Delphic sets by obtaining a streaming algorithm for model counting of DNF formulas. It is worth remarking that we view a key strength of our work is the simplicity of both the algorithm and its theoretical analysis, which makes it amenable to practical implementation and easy adoption.