H. Ashtiani, S. Ben-David, Nicholas J. A. Harvey, Christopher Liaw, Abbas Mehrabian, Y. Plan
{"title":"Near-optimal Sample Complexity Bounds for Robust Learning of Gaussian Mixtures via Compression Schemes","authors":"H. Ashtiani, S. Ben-David, Nicholas J. A. Harvey, Christopher Liaw, Abbas Mehrabian, Y. Plan","doi":"10.1145/3417994","DOIUrl":"https://doi.org/10.1145/3417994","url":null,"abstract":"We introduce a novel technique for distribution learning based on a notion of sample compression. Any class of distributions that allows such a compression scheme can be learned with few samples. Moreover, if a class of distributions has such a compression scheme, then so do the classes of products and mixtures of those distributions. As an application of this technique, we prove that ˜Θ(kd2/ε2) samples are necessary and sufficient for learning a mixture of k Gaussians in Rd, up to error ε in total variation distance. This improves both the known upper bounds and lower bounds for this problem. For mixtures of axis-aligned Gaussians, we show that Õ(kd/ε2) samples suffice, matching a known lower bound. Moreover, these results hold in an agnostic learning (or robust estimation) setting, in which the target distribution is only approximately a mixture of Gaussians. Our main upper bound is proven by showing that the class of Gaussians in Rd admits a small compression scheme.","PeriodicalId":17199,"journal":{"name":"Journal of the ACM (JACM)","volume":"9 1","pages":"1 - 42"},"PeriodicalIF":0.0,"publicationDate":"2017-10-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"76471591","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Invited Articles Foreword","authors":"É. Tardos","doi":"10.1145/3140539","DOIUrl":"https://doi.org/10.1145/3140539","url":null,"abstract":"The Invited Article section of this issue consists of two papers. The article “Parallel-Correctness and Transferability for Conjunctive Queries” by Tom J. Ameloot, Gaerano Geck, Bas Ketsman, Frank Neven, and Thomas Schwentick was invited from the 34th Annual ACM Symposium on Principles of Distributed Computing (PODC’15). The article “An Average-case Depth Hierarchy Theorem for Boolean Circuits” by Johan Håstad, Benjamin Rossman, Rocco A. Servedio, and LiYang Tan won a best paper award at the 56th Annual Symposium on Foundations of Computer Science (FOCS’15). We thank the PODC’15 and FOCS’15 Program Committees for their help in selecting these invited articles, and we thank editors Georg Gottlob and Avi Widgerson for handling the articles.","PeriodicalId":17199,"journal":{"name":"Journal of the ACM (JACM)","volume":"162 1","pages":"1 - 1"},"PeriodicalIF":0.0,"publicationDate":"2017-10-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73270815","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Estimating the Unseen","authors":"Paul Valiant, G. Valiant","doi":"10.1145/3125643","DOIUrl":"https://doi.org/10.1145/3125643","url":null,"abstract":"We show that a class of statistical properties of distributions, which includes such practically relevant properties as entropy, the number of distinct elements, and distance metrics between pairs of distributions, can be estimated given a sublinear sized sample. Specifically, given a sample consisting of independent draws from any distribution over at most k distinct elements, these properties can be estimated accurately using a sample of size O(k log k). For these estimation tasks, this performance is optimal, to constant factors. Complementing these theoretical results, we also demonstrate that our estimators perform exceptionally well, in practice, for a variety of estimation tasks, on a variety of natural distributions, for a wide range of parameters. The key step in our approach is to first use the sample to characterize the “unseen” portion of the distribution—effectively reconstructing this portion of the distribution as accurately as if one had a logarithmic factor larger sample. This goes beyond such tools as the Good-Turing frequency estimation scheme, which estimates the total probability mass of the unobserved portion of the distribution: We seek to estimate the shape of the unobserved portion of the distribution. This work can be seen as introducing a robust, general, and theoretically principled framework that, for many practical applications, essentially amplifies the sample size by a logarithmic factor; we expect that it may be fruitfully used as a component within larger machine learning and statistical analysis systems.","PeriodicalId":17199,"journal":{"name":"Journal of the ACM (JACM)","volume":"28 1","pages":"1 - 41"},"PeriodicalIF":0.0,"publicationDate":"2017-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86639629","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Matching Polytope has Exponential Extension Complexity","authors":"T. Rothvoss","doi":"10.1145/3127497","DOIUrl":"https://doi.org/10.1145/3127497","url":null,"abstract":"A popular method in combinatorial optimization is to express polytopes P, which may potentially have exponentially many facets, as solutions of linear programs that use few extra variables to reduce the number of constraints down to a polynomial. After two decades of standstill, recent years have brought amazing progress in showing lower bounds for the so-called extension complexity, which for a polytope P denotes the smallest number of inequalities necessary to describe a higher-dimensional polytope Q that can be linearly projected on P. However, the central question in this field remained wide open: can the perfect matching polytope be written as an LP with polynomially many constraints? We answer this question negatively. In fact, the extension complexity of the perfect matching polytope in a complete n-node graph is 2Ω (n). By a known reduction, this also improves the lower bound on the extension complexity for the TSP polytope from 2Ω (√ n) to 2Ω (n).","PeriodicalId":17199,"journal":{"name":"Journal of the ACM (JACM)","volume":"144 1","pages":"1 - 19"},"PeriodicalIF":0.0,"publicationDate":"2017-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"73044770","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The Complexity of Mean-Payoff Pushdown Games","authors":"K. Chatterjee, Yaron Velner","doi":"10.1145/3121408","DOIUrl":"https://doi.org/10.1145/3121408","url":null,"abstract":"Two-player games on graphs are central in many problems in formal verification and program analysis, such as synthesis and verification of open systems. In this work, we consider solving recursive game graphs (or pushdown game graphs) that model the control flow of sequential programs with recursion. While pushdown games have been studied before with qualitative objectives—such as reachability and ω-regular objectives—in this work, we study for the first time such games with the most well-studied quantitative objective, the mean-payoff objective. In pushdown games, two types of strategies are relevant: (1) global strategies, which depend on the entire global history; and (2) modular strategies, which have only local memory and thus do not depend on the context of invocation but rather only on the history of the current invocation of the module. Our main results are as follows: (1) One-player pushdown games with mean-payoff objectives under global strategies are decidable in polynomial time. (2) Two-player pushdown games with mean-payoff objectives under global strategies are undecidable. (3) One-player pushdown games with mean-payoff objectives under modular strategies are NP-hard. (4) Two-player pushdown games with mean-payoff objectives under modular strategies can be solved in NP (i.e., both one-player and two-player pushdown games with mean-payoff objectives under modular strategies are NP-complete). We also establish the optimal strategy complexity by showing that global strategies for mean-payoff objectives require infinite memory even in one-player pushdown games and memoryless modular strategies are sufficient in two-player pushdown games. Finally, we also show that all the problems have the same complexity if the stack boundedness condition is added, where along with the mean-payoff objective the player must also ensure that the stack height is bounded.","PeriodicalId":17199,"journal":{"name":"Journal of the ACM (JACM)","volume":"67 1","pages":"1 - 49"},"PeriodicalIF":0.0,"publicationDate":"2017-09-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"83940350","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Near-Optimal Regret Bounds for Thompson Sampling","authors":"Shipra Agrawal, Navin Goyal","doi":"10.1145/3088510","DOIUrl":"https://doi.org/10.1145/3088510","url":null,"abstract":"Thompson Sampling (TS) is one of the oldest heuristics for multiarmed bandit problems. It is a randomized algorithm based on Bayesian ideas and has recently generated significant interest after several studies demonstrated that it has favorable empirical performance compared to the state-of-the-art methods. In this article, a novel and almost tight martingale-based regret analysis for Thompson Sampling is presented. Our technique simultaneously yields both problem-dependent and problem-independent bounds: (1) the first near-optimal problem-independent bound of O(√ NT ln T) on the expected regret and (2) the optimal problem-dependent bound of (1 + ϵ)Σi ln T / d(μi,μ1) + O(N/ϵ2) on the expected regret (this bound was first proven by Kaufmann et al. (2012b)). Our technique is conceptually simple and easily extends to distributions other than the Beta distribution used in the original TS algorithm. For the version of TS that uses Gaussian priors, we prove a problem-independent bound of O(√ NT ln N) on the expected regret and show the optimality of this bound by providing a matching lower bound. This is the first lower bound on the performance of a natural version of Thompson Sampling that is away from the general lower bound of Ω (√ NT) for the multiarmed bandit problem.","PeriodicalId":17199,"journal":{"name":"Journal of the ACM (JACM)","volume":"70 1","pages":"1 - 24"},"PeriodicalIF":0.0,"publicationDate":"2017-09-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"85566016","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Embeddability in R3 is NP-hard","authors":"A. D. Mesmay, Y. Rieck, E. Sedgwick, M. Tancer","doi":"10.1145/3396593","DOIUrl":"https://doi.org/10.1145/3396593","url":null,"abstract":"We prove that the problem of deciding whether a two- or three-dimensional simplicial complex embeds into R3 is NP-hard. Our construction also shows that deciding whether a 3-manifold with boundary tori admits an S3 filling is NP-hard. The former stands in contrast with the lower-dimensional cases, which can be solved in linear time, and the latter with a variety of computational problems in 3-manifold topology, for example, unknot or 3-sphere recognition, which are in NP ∩ co- NP. (Membership of the latter problem in co-NP assumes the Generalized Riemann Hypotheses.) Our reduction encodes a satisfiability instance into the embeddability problem of a 3-manifold with boundary tori, and relies extensively on techniques from low-dimensional topology, most importantly Dehn fillings of manifolds with boundary tori.","PeriodicalId":17199,"journal":{"name":"Journal of the ACM (JACM)","volume":"2 1","pages":"1 - 29"},"PeriodicalIF":0.0,"publicationDate":"2017-08-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"90954740","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
P. Abdulla, Stavros Aronis, B. Jonsson, Konstantinos Sagonas
{"title":"Source Sets","authors":"P. Abdulla, Stavros Aronis, B. Jonsson, Konstantinos Sagonas","doi":"10.1145/3073408","DOIUrl":"https://doi.org/10.1145/3073408","url":null,"abstract":"Stateless model checking is a powerful method for program verification that, however, suffers from an exponential growth in the number of explored executions. A successful technique for reducing this number, while still maintaining complete coverage, is Dynamic Partial Order Reduction (DPOR), an algorithm originally introduced by Flanagan and Godefroid in 2005 and since then not only used as a point of reference but also extended by various researchers. In this article, we present a new DPOR algorithm, which is the first to be provably optimal in that it always explores the minimal number of executions. It is based on a novel class of sets, called source sets, that replace the role of persistent sets in previous algorithms. We begin by showing how to modify the original DPOR algorithm to work with source sets, resulting in an efficient and simple-to-implement algorithm, called source-DPOR. Subsequently, we enhance this algorithm with a novel mechanism, called wakeup trees, that allows the resulting algorithm, called optimal-DPOR, to achieve optimality. Both algorithms are then extended to computational models where processes may disable each other, for example, via locks. Finally, we discuss tradeoffs of the source- and optimal-DPOR algorithm and present programs that illustrate significant time and space performance differences between them. We have implemented both algorithms in a publicly available stateless model checking tool for Erlang programs, while the source-DPOR algorithm is at the core of a publicly available stateless model checking tool for C/pthread programs running on machines with relaxed memory models. Experiments show that source sets significantly increase the performance of stateless model checking compared to using the original DPOR algorithm and that wakeup trees incur only a small overhead in both time and space in practice.","PeriodicalId":17199,"journal":{"name":"Journal of the ACM (JACM)","volume":"22 1","pages":"1 - 49"},"PeriodicalIF":0.0,"publicationDate":"2017-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86183353","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Online Bipartite Matching with Amortized O(log 2 n) Replacements","authors":"A. Bernstein, J. Holm, E. Rotenberg","doi":"10.1145/3344999","DOIUrl":"https://doi.org/10.1145/3344999","url":null,"abstract":"In the online bipartite matching problem with replacements, all the vertices on one side of the bipartition are given, and the vertices on the other side arrive one-by-one with all their incident edges. The goal is to maintain a maximum matching while minimizing the number of changes (replacements) to the matching. We show that the greedy algorithm that always takes the shortest augmenting path from the newly inserted vertex (denoted the SAP protocol) uses at most amortized O(log 2 n) replacements per insertion, where n is the total number of vertices inserted. This is the first analysis to achieve a polylogarithmic number of replacements for any replacement strategy, almost matching the Ω (log n) lower bound. The previous best strategy known achieved amortized O(√ n) replacements [Bosek, Leniowski, Sankowski, Zych, FOCS 2014]. For the SAP protocol in particular, nothing better than the trivial O(n) bound was known except in special cases. Our analysis immediately implies the same upper bound of O(log 2 n) reassignments for the capacitated assignment problem, where each vertex on the static side of the bipartition is initialized with the capacity to serve a number of vertices. We also analyze the problem of minimizing the maximum server load. We show that if the final graph has maximum server load L, then the SAP protocol makes amortized O(min { L log2 n , √ nlog n}) reassignments. We also show that this is close to tight, because Ω (min { L, √ n}) reassignments can be necessary.","PeriodicalId":17199,"journal":{"name":"Journal of the ACM (JACM)","volume":"2005 1","pages":"1 - 23"},"PeriodicalIF":0.0,"publicationDate":"2017-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"86951707","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}