Sahel Torkamani;Javad B. Ebrahimi;Parastoo Sadeghi;Rafael G. L. D’Oliveira;Muriel Médard
{"title":"Optimal Binary Differential Privacy via Graphs","authors":"Sahel Torkamani;Javad B. Ebrahimi;Parastoo Sadeghi;Rafael G. L. D’Oliveira;Muriel Médard","doi":"10.1109/JSAIT.2024.3384183","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3384183","url":null,"abstract":"We present the notion of \u0000<italic>reasonable utility</i>\u0000 for binary mechanisms, which applies to all utility functions in the literature. This notion induces a partial ordering on the performance of all binary differentially private (DP) mechanisms. DP mechanisms that are maximal elements of this ordering are optimal DP mechanisms for every reasonable utility. By looking at differential privacy as a randomized graph coloring, we characterize these optimal DP in terms of their behavior on a certain subset of the boundary datasets we call a boundary hitting set. In the process of establishing our results, we also introduce a useful notion that generalizes DP conditions for binary-valued queries, which we coin as suitable pairs. Suitable pairs abstract away the algebraic roles of \u0000<inline-formula> <tex-math>$varepsilon ,delta $ </tex-math></inline-formula>\u0000 in the DP framework, making the derivations and understanding of our proofs simpler. Additionally, the notion of a suitable pair can potentially capture privacy conditions in frameworks other than DP and may be of independent interest.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"162-174"},"PeriodicalIF":0.0,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140818803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neophytos Charalambides;Hessam Mahdavifar;Mert Pilanci;Alfred O. Hero
{"title":"Iterative Sketching for Secure Coded Regression","authors":"Neophytos Charalambides;Hessam Mahdavifar;Mert Pilanci;Alfred O. Hero","doi":"10.1109/JSAIT.2024.3384395","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3384395","url":null,"abstract":"Linear regression is a fundamental and primitive problem in supervised machine learning, with applications ranging from epidemiology to finance. In this work, we propose methods for speeding up distributed linear regression. We do so by leveraging randomized techniques, while also ensuring security and straggler resiliency in asynchronous distributed computing systems. Specifically, we randomly rotate the basis of the system of equations and then subsample \u0000<italic>blocks</i>\u0000, to simultaneously secure the information and reduce the dimension of the regression problem. In our setup, the basis rotation corresponds to an encoded encryption in an \u0000<italic>approximate gradient coding scheme</i>\u0000, and the subsampling corresponds to the responses of the non-straggling servers in the centralized coded computing framework. This results in a distributive \u0000<italic>iterative</i>\u0000 stochastic approach for matrix compression and steepest descent.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"148-161"},"PeriodicalIF":0.0,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Total Variation Meets Differential Privacy","authors":"Elena Ghazi;Ibrahim Issa","doi":"10.1109/JSAIT.2024.3384083","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3384083","url":null,"abstract":"The framework of approximate differential privacy is considered, and augmented by leveraging the notion of “the total variation of a (privacy-preserving) mechanism” (denoted by \u0000<inline-formula> <tex-math>$eta $ </tex-math></inline-formula>\u0000-TV). With this refinement, an exact composition result is derived, and shown to be significantly tighter than the optimal bounds for differential privacy (which do not consider the total variation). Furthermore, it is shown that \u0000<inline-formula> <tex-math>$(varepsilon ,delta )$ </tex-math></inline-formula>\u0000-DP with \u0000<inline-formula> <tex-math>$eta $ </tex-math></inline-formula>\u0000-TV is closed under subsampling. The induced total variation of commonly used mechanisms are computed. Moreover, the notion of total variation of a mechanism is studied in the local privacy setting and privacy-utility tradeoffs are investigated. In particular, total variation distance and KL divergence are considered as utility functions and studied through the lens of contraction coefficients. Finally, the results are compared and connected to the locally differentially private setting.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"207-220"},"PeriodicalIF":0.0,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140820416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinying Zou;Samir M. Perlaza;Iñaki Esnaola;Eitan Altman;H. Vincent Poor
{"title":"The Worst-Case Data-Generating Probability Measure in Statistical Learning","authors":"Xinying Zou;Samir M. Perlaza;Iñaki Esnaola;Eitan Altman;H. Vincent Poor","doi":"10.1109/JSAIT.2024.3383281","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3383281","url":null,"abstract":"The worst-case data-generating (WCDG) probability measure is introduced as a tool for characterizing the generalization capabilities of machine learning algorithms. Such a WCDG probability measure is shown to be the unique solution to two different optimization problems: \u0000<inline-formula> <tex-math>$(a)$ </tex-math></inline-formula>\u0000 The maximization of the expected loss over the set of probability measures on the datasets whose relative entropy with respect to a \u0000<italic>reference measure</i>\u0000 is not larger than a given threshold; and \u0000<inline-formula> <tex-math>$(b)$ </tex-math></inline-formula>\u0000 The maximization of the expected loss with regularization by relative entropy with respect to the reference measure. Such a reference measure can be interpreted as a prior on the datasets. The WCDG cumulants are finite and bounded in terms of the cumulants of the reference measure. To analyze the concentration of the expected empirical risk induced by the WCDG probability measure, the notion of \u0000<inline-formula> <tex-math>$(epsilon, delta )$ </tex-math></inline-formula>\u0000-robustness of models is introduced. Closed-form expressions are presented for the sensitivity of the expected loss for a fixed model. These results lead to a novel expression for the generalization error of arbitrary machine learning algorithms. This exact expression is provided in terms of the WCDG probability measure and leads to an upper bound that is equal to the sum of the mutual information and the lautum information between the models and the datasets, up to a constant factor. This upper bound is achieved by a Gibbs algorithm. This finding reveals that an exploration into the generalization error of the Gibbs algorithm facilitates the derivation of overarching insights applicable to any machine learning algorithm.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"175-189"},"PeriodicalIF":0.0,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140844521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giuseppe Serra;Photios A. Stavrou;Marios Kountouris
{"title":"On the Computation of the Gaussian Rate–Distortion–Perception Function","authors":"Giuseppe Serra;Photios A. Stavrou;Marios Kountouris","doi":"10.1109/JSAIT.2024.3381230","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3381230","url":null,"abstract":"In this paper, we study the computation of the rate-distortion-perception function (RDPF) for a multivariate Gaussian source assuming jointly Gaussian reconstruction under mean squared error (MSE) distortion and, respectively, Kullback–Leibler divergence, geometric Jensen-Shannon divergence, squared Hellinger distance, and squared Wasserstein-2 distance perception metrics. To this end, we first characterize the analytical bounds of the scalar Gaussian RDPF for the aforementioned divergence functions, also providing the RDPF-achieving forward “test-channel” realization. Focusing on the multivariate case, assuming jointly Gaussian reconstruction and tensorizable distortion and perception metrics, we establish that the optimal solution resides on the vector space spanned by the eigenvector of the source covariance matrix. Consequently, the multivariate optimization problem can be expressed as a function of the scalar Gaussian RDPFs of the source marginals, constrained by global distortion and perception levels. Leveraging this characterization, we design an alternating minimization scheme based on the block nonlinear Gauss–Seidel method, which optimally solves the problem while identifying the Gaussian RDPF-achieving realization. Furthermore, the associated algorithmic embodiment is provided, as well as the convergence and the rate of convergence characterization. Lastly, for the “perfect realism” regime, the analytical solution for the multivariate Gaussian RDPF is obtained. We corroborate our results with numerical simulations and draw connections to existing results.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"314-330"},"PeriodicalIF":0.0,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141084836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Toward General Function Approximation in Nonstationary Reinforcement Learning","authors":"Songtao Feng;Ming Yin;Ruiquan Huang;Yu-Xiang Wang;Jing Yang;Yingbin Liang","doi":"10.1109/JSAIT.2024.3381818","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3381818","url":null,"abstract":"Function approximation has experienced significant success in the field of reinforcement learning (RL). Despite a handful of progress on developing theory for nonstationary RL with function approximation under structural assumptions, existing work for nonstationary RL with general function approximation is still limited. In this work, we investigate two different approaches for nonstationary RL with general function approximation: confidence-set based algorithm and UCB-type algorithm. For the first approach, we introduce a new complexity measure called dynamic Bellman Eluder (DBE) for nonstationary MDPs, and then propose a confidence-set based algorithm SW-OPEA based on the complexity metric. SW-OPEA features the sliding window mechanism and a novel confidence set design for nonstationary MDPs. For the second approach, we propose a UCB-type algorithm LSVI-Nonstationary following the popular least-square-value-iteration (LSVI) framework, and mitigate the computational efficiency challenge of the confidence-set based approach. LSVI-Nonstationary features the restart mechanism and a new design of the bonus term to handle nonstationarity. The two proposed algorithms outperform the existing algorithms for nonstationary linear and tabular MDPs in the small variation budget setting. To the best of our knowledge, the two approaches are the first confidence-set based algorithm and UCB-type algorithm in the context of nonstationary MDPs.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"190-206"},"PeriodicalIF":0.0,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140820417","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exactly Optimal and Communication-Efficient Private Estimation via Block Designs","authors":"Hyun-Young Park;Seung-Hyun Nam;Si-Hyeon Lee","doi":"10.1109/JSAIT.2024.3381195","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3381195","url":null,"abstract":"In this paper, we propose a new class of local differential privacy (LDP) schemes based on combinatorial block designs for discrete distribution estimation. This class not only recovers many known LDP schemes in a unified framework of combinatorial block design, but also suggests a novel way of finding new schemes achieving the exactly optimal (or near-optimal) privacy-utility trade-off with lower communication costs. Indeed, we find many new LDP schemes that achieve the exactly optimal privacy-utility trade-off, with the minimum communication cost among all the unbiased or consistent schemes, for a certain set of input data size and LDP constraint. Furthermore, to partially solve the sparse existence issue of block design schemes, we consider a broader class of LDP schemes based on regular and pairwise-balanced designs, called RPBD schemes, which relax one of the symmetry requirements on block designs. By considering this broader class of RPBD schemes, we can find LDP schemes achieving near-optimal privacy-utility trade-off with reasonably low communication costs for a much larger set of input data size and LDP constraint.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"123-134"},"PeriodicalIF":0.0,"publicationDate":"2024-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140621268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Learning Robust to Distributional Uncertainties and Adversarial Data","authors":"Alireza Sadeghi;Gang Wang;Georgios B. Giannakis","doi":"10.1109/JSAIT.2024.3381869","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3381869","url":null,"abstract":"Successful training of data-intensive deep neural networks critically rely on vast, clean, and high-quality datasets. In practice however, their reliability diminishes, particularly with noisy, outlier-corrupted data samples encountered in testing. This challenge intensifies when dealing with anonymized, heterogeneous data sets stored across geographically distinct locations due to, e.g., privacy concerns. This present paper introduces robust learning frameworks tailored for centralized and federated learning scenarios. Our goal is to fortify model resilience with a focus that lies in (i) addressing distribution shifts from training to inference time; and, (ii) ensuring test-time robustness, when a trained model may encounter outliers or adversarially contaminated test data samples. To this aim, we start with a centralized setting where the true data distribution is considered unknown, but residing within a Wasserstein ball centered at the empirical distribution. We obtain robust models by minimizing the worst-case expected loss within this ball, yielding an intractable infinite-dimensional optimization problem. Upon leverage the strong duality condition, we arrive at a tractable surrogate learning problem. We develop two stochastic primal-dual algorithms to solve the resultant problem: one for \u0000<inline-formula> <tex-math>$epsilon $ </tex-math></inline-formula>\u0000-accurate convex sub-problems and another for a single gradient ascent step. We further develop a distributionally robust federated learning framework to learn robust model using heterogeneous data sets stored at distinct locations by solving per-learner’s sub-problems locally, offering robustness with modest computational overhead and considering data distribution. Numerical tests corroborate merits of our training algorithms against distributional uncertainties and adversarially corrupted test data samples.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"105-122"},"PeriodicalIF":0.0,"publicationDate":"2024-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140619580","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Exactly Tight Information-Theoretic Generalization Error Bound for the Quadratic Gaussian Problem","authors":"Ruida Zhou;Chao Tian;Tie Liu","doi":"10.1109/JSAIT.2024.3380598","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3380598","url":null,"abstract":"We provide a new information-theoretic generalization error bound that is exactly tight (i.e., matching even the constant) for the canonical quadratic Gaussian (location) problem. Most existing bounds are order-wise loose in this setting, which has raised concerns about the fundamental capability of information-theoretic bounds in reasoning the generalization behavior for machine learning. The proposed new bound adopts the individual-sample-based approach proposed by Bu et al., but also has several key new ingredients. Firstly, instead of applying the change of measure inequality on the loss function, we apply it to the generalization error function itself; secondly, the bound is derived in a conditional manner; lastly, a reference distribution is introduced. The combination of these components produces a KL-divergence-based generalization error bound. We show that although the latter two new ingredients can help make the bound exactly tight, removing them does not significantly degrade the bound, leading to an asymptotically tight mutual-information-based bound. We further consider the vector Gaussian setting, where a direct application of the proposed bound again does not lead to tight bounds except in special cases. A refined bound is then proposed by a decomposition of loss functions, leading to a tight bound for the vector setting.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"94-104"},"PeriodicalIF":0.0,"publicationDate":"2024-03-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140606029","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Summary Statistic Privacy in Data Sharing","authors":"Zinan Lin;Shuaiqi Wang;Vyas Sekar;Giulia Fanti","doi":"10.1109/JSAIT.2024.3403811","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3403811","url":null,"abstract":"We study a setting where a data holder wishes to share data with a receiver, without revealing certain summary statistics of the data distribution (e.g., mean, standard deviation). It achieves this by passing the data through a randomization mechanism. We propose summary statistic privacy, a metric for quantifying the privacy risk of such a mechanism based on the worst-case probability of an adversary guessing the distributional secret within some threshold. Defining distortion as a worst-case Wasserstein-1 distance between the real and released data, we prove lower bounds on the tradeoff between privacy and distortion. We then propose a class of quantization mechanisms that can be adapted to different data distributions. We show that the quantization mechanism’s privacy-distortion tradeoff matches our lower bounds under certain regimes, up to small constant factors. Finally, we demonstrate on real-world datasets that the proposed quantization mechanisms achieve better privacy-distortion tradeoffs than alternative privacy mechanisms.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"369-384"},"PeriodicalIF":0.0,"publicationDate":"2024-03-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141422569","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}