{"title":"Optimal Differentially Private Algorithms for k-Means Clustering","authors":"Zhiyi Huang, Jinyan Liu","doi":"10.1145/3196959.3196977","DOIUrl":null,"url":null,"abstract":"We consider privacy-preserving k-means clustering. For the objective of minimizing the Wasserstein distance between the output and the optimal solution, we show that there is a polynomial-time (ε,δ)-differentially private algorithm which, for any sufficiently large Φ2 well-separated datasets, outputs k centers that are within Wasserstein distance Ø(Φ2) from the optimal. This result improves the previous bounds by removing the dependence on ε, number of centers k, and dimension d. Further, we prove a matching lower bound that no (ε, δ)-differentially private algorithm can guarantee Wasserstein distance less than Ømega (Φ2) and, thus, our positive result is optimal up to a constant factor. For minimizing the k-means objective when the dimension d is bounded, we propose a polynomial-time private local search algorithm that outputs an αn-additive approximation when the size of the dataset is at least ~Ø (k3/2 · d · ε-1 · poly(α-1)).","PeriodicalId":344370,"journal":{"name":"Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","volume":"103 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"32","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 37th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3196959.3196977","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 32
Abstract
We consider privacy-preserving k-means clustering. For the objective of minimizing the Wasserstein distance between the output and the optimal solution, we show that there is a polynomial-time (ε,δ)-differentially private algorithm which, for any sufficiently large Φ2 well-separated datasets, outputs k centers that are within Wasserstein distance Ø(Φ2) from the optimal. This result improves the previous bounds by removing the dependence on ε, number of centers k, and dimension d. Further, we prove a matching lower bound that no (ε, δ)-differentially private algorithm can guarantee Wasserstein distance less than Ømega (Φ2) and, thus, our positive result is optimal up to a constant factor. For minimizing the k-means objective when the dimension d is bounded, we propose a polynomial-time private local search algorithm that outputs an αn-additive approximation when the size of the dataset is at least ~Ø (k3/2 · d · ε-1 · poly(α-1)).