Gholamali Aminian;Saeed Masiha;Laura Toni;Miguel R. D. Rodrigues
{"title":"Learning Algorithm Generalization Error Bounds via Auxiliary Distributions","authors":"Gholamali Aminian;Saeed Masiha;Laura Toni;Miguel R. D. Rodrigues","doi":"10.1109/JSAIT.2024.3391900","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3391900","url":null,"abstract":"Generalization error bounds are essential for comprehending how well machine learning models work. In this work, we suggest a novel method, i.e., the Auxiliary Distribution Method, that leads to new upper bounds on expected generalization errors that are appropriate for supervised learning scenarios. We show that our general upper bounds can be specialized under some conditions to new bounds involving the \u0000<inline-formula> <tex-math>$alpha $ </tex-math></inline-formula>\u0000-Jensen-Shannon, \u0000<inline-formula> <tex-math>$alpha $ </tex-math></inline-formula>\u0000-Rényi \u0000<inline-formula> <tex-math>$(0lt alpha lt 1)$ </tex-math></inline-formula>\u0000 information between a random variable modeling the set of training samples and another random variable modeling the set of hypotheses. Our upper bounds based on \u0000<inline-formula> <tex-math>$alpha $ </tex-math></inline-formula>\u0000-Jensen-Shannon information are also finite. Additionally, we demonstrate how our auxiliary distribution method can be used to derive the upper bounds on excess risk of some learning algorithms in the supervised learning context and the generalization error under the distribution mismatch scenario in supervised learning algorithms, where the distribution mismatch is modeled as \u0000<inline-formula> <tex-math>$alpha $ </tex-math></inline-formula>\u0000-Jensen-Shannon or \u0000<inline-formula> <tex-math>$alpha $ </tex-math></inline-formula>\u0000-Rényi divergence between the distribution of test and training data samples distributions. We also outline the conditions for which our proposed upper bounds might be tighter than other earlier upper bounds.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"273-284"},"PeriodicalIF":0.0,"publicationDate":"2024-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141096301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Neural Distributed Compressor Discovers Binning","authors":"Ezgi Ozyilkan;Johannes Ballé;Elza Erkip","doi":"10.1109/JSAIT.2024.3393429","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3393429","url":null,"abstract":"We consider lossy compression of an information source when the decoder has lossless access to a correlated one. This setup, also known as the Wyner-Ziv problem, is a special case of distributed source coding. To this day, practical approaches for the Wyner-Ziv problem have neither been fully developed nor heavily investigated. We propose a data-driven method based on machine learning that leverages the universal function approximation capability of artificial neural networks. We find that our neural network-based compression scheme, based on variational vector quantization, recovers some principles of the optimum theoretical solution of the Wyner-Ziv setup, such as binning in the source space as well as optimal combination of the quantization index and side information, for exemplary sources. These behaviors emerge although no structure exploiting knowledge of the source distributions was imposed. Binning is a widely used tool in information theoretic proofs and methods, and to our knowledge, this is the first time it has been explicitly observed to emerge from data-driven learning.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"246-260"},"PeriodicalIF":0.0,"publicationDate":"2024-04-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140949055","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Training Generative Models From Privatized Data via Entropic Optimal Transport","authors":"Daria Reshetova;Wei-Ning Chen;Ayfer Özgür","doi":"10.1109/JSAIT.2024.3387463","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3387463","url":null,"abstract":"Local differential privacy is a powerful method for privacy-preserving data collection. In this paper, we develop a framework for training Generative Adversarial Networks (GANs) on differentially privatized data. We show that entropic regularization of optimal transport – a popular regularization method in the literature that has often been leveraged for its computational benefits – enables the generator to learn the raw (unprivatized) data distribution even though it only has access to privatized samples. We prove that at the same time this leads to fast statistical convergence at the parametric rate. This shows that entropic regularization of optimal transport uniquely enables the mitigation of both the effects of privatization noise and the curse of dimensionality in statistical convergence. We provide experimental evidence to support the efficacy of our framework in practice.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"221-235"},"PeriodicalIF":0.0,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140820376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Osama Hanna;Antonious M. Girgis;Christina Fragouli;Suhas Diggavi
{"title":"Differentially Private Stochastic Linear Bandits: (Almost) for Free","authors":"Osama Hanna;Antonious M. Girgis;Christina Fragouli;Suhas Diggavi","doi":"10.1109/JSAIT.2024.3389954","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3389954","url":null,"abstract":"In this paper, we propose differentially private algorithms for the problem of stochastic linear bandits in the central, local and shuffled models. In the central model, we achieve almost the same regret as the optimal non-private algorithms, which means we get privacy for free. In particular, we achieve a regret of \u0000<inline-formula> <tex-math>$tilde {O}left({sqrt {T}+{}frac {1}{varepsilon }}right)$ </tex-math></inline-formula>\u0000 matching the known lower bound for private linear bandits, while the best previously known algorithm achieves \u0000<inline-formula> <tex-math>$tilde {O}left({{}frac {1}{varepsilon }sqrt {T}}right)$ </tex-math></inline-formula>\u0000. In the local case, we achieve a regret of \u0000<inline-formula> <tex-math>$tilde {O}left({{}frac {1}{varepsilon }{sqrt {T}}}right)$ </tex-math></inline-formula>\u0000 which matches the non-private regret for constant \u0000<inline-formula> <tex-math>$varepsilon $ </tex-math></inline-formula>\u0000, but suffers a regret penalty when \u0000<inline-formula> <tex-math>$varepsilon $ </tex-math></inline-formula>\u0000 is small. In the shuffled model, we also achieve regret of \u0000<inline-formula> <tex-math>$tilde {O}left({sqrt {T}+{}frac {1}{varepsilon }}right)$ </tex-math></inline-formula>\u0000 while the best previously known algorithm suffers a regret of \u0000<inline-formula> <tex-math>$tilde {O}left({{}frac {1}{varepsilon }{T^{3/5}}}right)$ </tex-math></inline-formula>\u0000. Our numerical evaluation validates our theoretical results. Our results generalize for contextual linear bandits with known context distributions.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"135-147"},"PeriodicalIF":0.0,"publicationDate":"2024-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140818731","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sundara Rajan Srinivasavaradhan;Pavlos Nikolopoulos;Christina Fragouli;Suhas Diggavi
{"title":"Improving Group Testing via Gradient Descent","authors":"Sundara Rajan Srinivasavaradhan;Pavlos Nikolopoulos;Christina Fragouli;Suhas Diggavi","doi":"10.1109/JSAIT.2024.3386182","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3386182","url":null,"abstract":"We study the problem of group testing with non-identical, independent priors. So far, the pooling strategies that have been proposed in the literature take the following approach: a hand-crafted test design along with a decoding strategy is proposed, and guarantees are provided on how many tests are sufficient in order to identify all infections in a population. In this paper, we take a different, yet perhaps more practical, approach: we fix the decoder and the number of tests, and we ask, given these, what is the best test design one could use? We explore this question for the Definite Non-Defectives (DND) decoder. We formulate a (non-convex) optimization problem, where the objective function is the expected number of errors for a particular design. We find approximate solutions via gradient descent, which we further optimize with informed initialization. We illustrate through simulations that our method can achieve significant performance improvement over traditional approaches.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"236-245"},"PeriodicalIF":0.0,"publicationDate":"2024-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140844479","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sahel Torkamani;Javad B. Ebrahimi;Parastoo Sadeghi;Rafael G. L. D’Oliveira;Muriel Médard
{"title":"Optimal Binary Differential Privacy via Graphs","authors":"Sahel Torkamani;Javad B. Ebrahimi;Parastoo Sadeghi;Rafael G. L. D’Oliveira;Muriel Médard","doi":"10.1109/JSAIT.2024.3384183","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3384183","url":null,"abstract":"We present the notion of \u0000<italic>reasonable utility</i>\u0000 for binary mechanisms, which applies to all utility functions in the literature. This notion induces a partial ordering on the performance of all binary differentially private (DP) mechanisms. DP mechanisms that are maximal elements of this ordering are optimal DP mechanisms for every reasonable utility. By looking at differential privacy as a randomized graph coloring, we characterize these optimal DP in terms of their behavior on a certain subset of the boundary datasets we call a boundary hitting set. In the process of establishing our results, we also introduce a useful notion that generalizes DP conditions for binary-valued queries, which we coin as suitable pairs. Suitable pairs abstract away the algebraic roles of \u0000<inline-formula> <tex-math>$varepsilon ,delta $ </tex-math></inline-formula>\u0000 in the DP framework, making the derivations and understanding of our proofs simpler. Additionally, the notion of a suitable pair can potentially capture privacy conditions in frameworks other than DP and may be of independent interest.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"162-174"},"PeriodicalIF":0.0,"publicationDate":"2024-04-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140818803","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Neophytos Charalambides;Hessam Mahdavifar;Mert Pilanci;Alfred O. Hero
{"title":"Iterative Sketching for Secure Coded Regression","authors":"Neophytos Charalambides;Hessam Mahdavifar;Mert Pilanci;Alfred O. Hero","doi":"10.1109/JSAIT.2024.3384395","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3384395","url":null,"abstract":"Linear regression is a fundamental and primitive problem in supervised machine learning, with applications ranging from epidemiology to finance. In this work, we propose methods for speeding up distributed linear regression. We do so by leveraging randomized techniques, while also ensuring security and straggler resiliency in asynchronous distributed computing systems. Specifically, we randomly rotate the basis of the system of equations and then subsample \u0000<italic>blocks</i>\u0000, to simultaneously secure the information and reduce the dimension of the regression problem. In our setup, the basis rotation corresponds to an encoded encryption in an \u0000<italic>approximate gradient coding scheme</i>\u0000, and the subsampling corresponds to the responses of the non-straggling servers in the centralized coded computing framework. This results in a distributive \u0000<italic>iterative</i>\u0000 stochastic approach for matrix compression and steepest descent.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"148-161"},"PeriodicalIF":0.0,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140924701","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Total Variation Meets Differential Privacy","authors":"Elena Ghazi;Ibrahim Issa","doi":"10.1109/JSAIT.2024.3384083","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3384083","url":null,"abstract":"The framework of approximate differential privacy is considered, and augmented by leveraging the notion of “the total variation of a (privacy-preserving) mechanism” (denoted by \u0000<inline-formula> <tex-math>$eta $ </tex-math></inline-formula>\u0000-TV). With this refinement, an exact composition result is derived, and shown to be significantly tighter than the optimal bounds for differential privacy (which do not consider the total variation). Furthermore, it is shown that \u0000<inline-formula> <tex-math>$(varepsilon ,delta )$ </tex-math></inline-formula>\u0000-DP with \u0000<inline-formula> <tex-math>$eta $ </tex-math></inline-formula>\u0000-TV is closed under subsampling. The induced total variation of commonly used mechanisms are computed. Moreover, the notion of total variation of a mechanism is studied in the local privacy setting and privacy-utility tradeoffs are investigated. In particular, total variation distance and KL divergence are considered as utility functions and studied through the lens of contraction coefficients. Finally, the results are compared and connected to the locally differentially private setting.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"207-220"},"PeriodicalIF":0.0,"publicationDate":"2024-04-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140820416","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xinying Zou;Samir M. Perlaza;Iñaki Esnaola;Eitan Altman;H. Vincent Poor
{"title":"The Worst-Case Data-Generating Probability Measure in Statistical Learning","authors":"Xinying Zou;Samir M. Perlaza;Iñaki Esnaola;Eitan Altman;H. Vincent Poor","doi":"10.1109/JSAIT.2024.3383281","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3383281","url":null,"abstract":"The worst-case data-generating (WCDG) probability measure is introduced as a tool for characterizing the generalization capabilities of machine learning algorithms. Such a WCDG probability measure is shown to be the unique solution to two different optimization problems: \u0000<inline-formula> <tex-math>$(a)$ </tex-math></inline-formula>\u0000 The maximization of the expected loss over the set of probability measures on the datasets whose relative entropy with respect to a \u0000<italic>reference measure</i>\u0000 is not larger than a given threshold; and \u0000<inline-formula> <tex-math>$(b)$ </tex-math></inline-formula>\u0000 The maximization of the expected loss with regularization by relative entropy with respect to the reference measure. Such a reference measure can be interpreted as a prior on the datasets. The WCDG cumulants are finite and bounded in terms of the cumulants of the reference measure. To analyze the concentration of the expected empirical risk induced by the WCDG probability measure, the notion of \u0000<inline-formula> <tex-math>$(epsilon, delta )$ </tex-math></inline-formula>\u0000-robustness of models is introduced. Closed-form expressions are presented for the sensitivity of the expected loss for a fixed model. These results lead to a novel expression for the generalization error of arbitrary machine learning algorithms. This exact expression is provided in terms of the WCDG probability measure and leads to an upper bound that is equal to the sum of the mutual information and the lautum information between the models and the datasets, up to a constant factor. This upper bound is achieved by a Gibbs algorithm. This finding reveals that an exploration into the generalization error of the Gibbs algorithm facilitates the derivation of overarching insights applicable to any machine learning algorithm.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"175-189"},"PeriodicalIF":0.0,"publicationDate":"2024-04-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140844521","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giuseppe Serra;Photios A. Stavrou;Marios Kountouris
{"title":"On the Computation of the Gaussian Rate–Distortion–Perception Function","authors":"Giuseppe Serra;Photios A. Stavrou;Marios Kountouris","doi":"10.1109/JSAIT.2024.3381230","DOIUrl":"https://doi.org/10.1109/JSAIT.2024.3381230","url":null,"abstract":"In this paper, we study the computation of the rate-distortion-perception function (RDPF) for a multivariate Gaussian source assuming jointly Gaussian reconstruction under mean squared error (MSE) distortion and, respectively, Kullback–Leibler divergence, geometric Jensen-Shannon divergence, squared Hellinger distance, and squared Wasserstein-2 distance perception metrics. To this end, we first characterize the analytical bounds of the scalar Gaussian RDPF for the aforementioned divergence functions, also providing the RDPF-achieving forward “test-channel” realization. Focusing on the multivariate case, assuming jointly Gaussian reconstruction and tensorizable distortion and perception metrics, we establish that the optimal solution resides on the vector space spanned by the eigenvector of the source covariance matrix. Consequently, the multivariate optimization problem can be expressed as a function of the scalar Gaussian RDPFs of the source marginals, constrained by global distortion and perception levels. Leveraging this characterization, we design an alternating minimization scheme based on the block nonlinear Gauss–Seidel method, which optimally solves the problem while identifying the Gaussian RDPF-achieving realization. Furthermore, the associated algorithmic embodiment is provided, as well as the convergence and the rate of convergence characterization. Lastly, for the “perfect realism” regime, the analytical solution for the multivariate Gaussian RDPF is obtained. We corroborate our results with numerical simulations and draw connections to existing results.","PeriodicalId":73295,"journal":{"name":"IEEE journal on selected areas in information theory","volume":"5 ","pages":"314-330"},"PeriodicalIF":0.0,"publicationDate":"2024-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"141084836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}