{"title":"Exit Time Analysis for Approximations of Gradient Descent Trajectories Around Saddle Points","authors":"Rishabh Dixit;Mert Gürbüzbalaban;Waheed U Bajwa","doi":"10.1093/imaiai/iaac025","DOIUrl":"https://doi.org/10.1093/imaiai/iaac025","url":null,"abstract":"This paper considers the problem of understanding the exit time for trajectories of gradient-related first-order methods from saddle neighborhoods under some initial boundary conditions. Given the ‘flat’ geometry around saddle points, first-order methods can struggle to escape these regions in a fast manner due to the small magnitudes of gradients encountered. In particular, while it is known that gradient-related first-order methods escape strict-saddle neighborhoods, existing analytic techniques do not explicitly leverage the local geometry around saddle points in order to control behavior of gradient trajectories. It is in this context that this paper puts forth a rigorous geometric analysis of the gradient-descent method around strict-saddle neighborhoods using matrix perturbation theory. In doing so, it provides a key result that can be used to generate an approximate gradient trajectory for any given initial conditions. In addition, the analysis leads to a linear exit-time solution for gradient-descent method under certain necessary initial conditions, which explicitly bring out the dependence on problem dimension, conditioning of the saddle neighborhood, and more, for a class of strict-saddle functions.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50297617","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Uncertainty quantification in the Bradley–Terry–Luce model","authors":"Chao Gao;Yandi Shen;Anderson Y Zhang","doi":"10.1093/imaiai/iaac032","DOIUrl":"https://doi.org/10.1093/imaiai/iaac032","url":null,"abstract":"The Bradley–Terry–Luce (BTL) model is a benchmark model for pairwise comparisons between individuals. Despite recent progress on the first-order asymptotics of several popular procedures, the understanding of uncertainty quantification in the BTL model remains largely incomplete, especially when the underlying comparison graph is sparse. In this paper, we fill this gap by focusing on two estimators that have received much recent attention: the maximum likelihood estimator (MLE) and the spectral estimator. Using a unified proof strategy, we derive sharp and uniform non-asymptotic expansions for both estimators in the sparsest possible regime (up to some poly-logarithmic factors) of the underlying comparison graph. These expansions allow us to obtain: (i) finite-dimensional central limit theorems for both estimators; (ii) construction of confidence intervals for individual ranks; (iii) optimal constant of \u0000<tex>$ell _2$</tex>\u0000 estimation, which is achieved by the MLE but not by the spectral estimator. Our proof is based on a self-consistent equation of the second-order remainder vector and a novel leave-two-out analysis.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50297918","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Optimal orthogonal group synchronization and rotation group synchronization","authors":"Chao Gao;Anderson Y Zhang","doi":"10.1093/imaiai/iaac022","DOIUrl":"https://doi.org/10.1093/imaiai/iaac022","url":null,"abstract":"We study the statistical estimation problem of orthogonal group synchronization and rotation group synchronization. The model is \u0000<tex>$Y_{ij} = Z_i^* Z_j^{*T} + sigma W_{ij}in{mathbb{R}}^{dtimes d}$</tex>\u0000 where \u0000<tex>$W_{ij}$</tex>\u0000 is a Gaussian random matrix and \u0000<tex>$Z_i^*$</tex>\u0000 is either an orthogonal matrix or a rotation matrix, and each \u0000<tex>$Y_{ij}$</tex>\u0000 is observed independently with probability \u0000<tex>$p$</tex>\u0000. We analyze an iterative polar decomposition algorithm for the estimation of \u0000<tex>$Z^*$</tex>\u0000 and show it has an error of \u0000<tex>$(1+o(1))frac{sigma ^2 d(d-1)}{2np}$</tex>\u0000 when initialized by spectral methods. A matching minimax lower bound is further established that leads to the optimality of the proposed algorithm as it achieves the exact minimax risk.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50297653","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Fast splitting algorithms for sparsity-constrained and noisy group testing","authors":"Eric Price;Jonathan Scarlett;Nelvin Tan","doi":"10.1093/imaiai/iaac031","DOIUrl":"https://doi.org/10.1093/imaiai/iaac031","url":null,"abstract":"In group testing, the goal is to identify a subset of defective items within a larger set of items based on tests whose outcomes indicate whether at least one defective item is present. This problem is relevant in areas such as medical testing, DNA sequencing, communication protocols and many more. In this paper, we study (i) a sparsity-constrained version of the problem, in which the testing procedure is subjected to one of the following two constraints: items are finitely divisible and thus may participate in at most \u0000<tex>$gamma $</tex>\u0000 tests; or tests are size-constrained to pool no more than \u0000<tex>$rho $</tex>\u0000 items per test; and (ii) a noisy version of the problem, where each test outcome is independently flipped with some constant probability. Under each of these settings, considering the for-each recovery guarantee with asymptotically vanishing error probability, we introduce a fast splitting algorithm and establish its near-optimality not only in terms of the number of tests, but also in terms of the decoding time. While the most basic formulations of our algorithms require \u0000<tex>$varOmega (n)$</tex>\u0000 storage for each algorithm, we also provide low-storage variants based on hashing, with similar recovery guarantees.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50297919","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"On the robustness to adversarial corruption and to heavy-tailed data of the Stahel–Donoho median of means","authors":"Jules Depersin;Guillaume Lecué","doi":"10.1093/imaiai/iaac026","DOIUrl":"https://doi.org/10.1093/imaiai/iaac026","url":null,"abstract":"We consider median of means (MOM) versions of the Stahel–Donoho outlyingness (SDO) [23, 66] and of the Median Absolute Deviation (MAD) [30] functions to construct subgaussian estimators of a mean vector under adversarial contamination and heavy-tailed data. We develop a single analysis of the MOM version of the SDO which covers all cases ranging from the Gaussian case to the \u0000<tex>$L_2$</tex>\u0000 case. It is based on isomorphic and almost isometric properties of the MOM versions of SDO and MAD. This analysis also covers cases where the mean does not even exist but a location parameter does; in those cases we still recover the same subgaussian rates and the same price for adversarial contamination even though there is not even a first moment. These properties are achieved by the classical SDO median and are therefore the first non-asymptotic statistical bounds on the Stahel–Donoho median complementing the \u0000<tex>$sqrt{n}$</tex>\u0000-consistency [58] and asymptotic normality [74] of the Stahel–Donoho estimators. We also show that the MOM version of MAD can be used to construct an estimator of the covariance matrix only under the existence of a second moment or of a scatter matrix if a second moment does not exist.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50298050","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anatoli Juditsky;Andrei Kulunchakov;Hlib Tsyntseus
{"title":"Sparse recovery by reduced variance stochastic approximation","authors":"Anatoli Juditsky;Andrei Kulunchakov;Hlib Tsyntseus","doi":"10.1093/imaiai/iaac028","DOIUrl":"https://doi.org/10.1093/imaiai/iaac028","url":null,"abstract":"In this paper, we discuss application of iterative Stochastic Optimization routines to the problem of sparse signal recovery from noisy observation. Using Stochastic Mirror Descent algorithm as a building block, we develop a multistage procedure for recovery of sparse solutions to Stochastic Optimization problem under assumption of smoothness and quadratic minoration on the expected objective. An interesting feature of the proposed algorithm is linear convergence of the approximate solution during the preliminary phase of the routine when the component of stochastic error in the gradient observation, which is due to bad initial approximation of the optimal solution, is larger than the ‘ideal’ asymptotic error component owing to observation noise ‘at the optimal solution’. We also show how one can straightforwardly enhance reliability of the corresponding solution using Median-of-Means-like techniques.We illustrate the performance of the proposed algorithms in application to classical problems of recovery of sparse and low-rank signals in the generalized linear regression framework. We show, under rather weak assumption on the regressor and noise distributions, how they lead to parameter estimates which obey (up to factors which are logarithmic in problem dimension and confidence level) the best known accuracy bounds.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50298051","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"The geometry of adversarial training in binary classification","authors":"Leon Bungert;Nicolás García Trillos;Ryan Murray","doi":"10.1093/imaiai/iaac029","DOIUrl":"https://doi.org/10.1093/imaiai/iaac029","url":null,"abstract":"We establish an equivalence between a family of adversarial training problems for non-parametric binary classification and a family of regularized risk minimization problems where the regularizer is a nonlocal perimeter functional. The resulting regularized risk minimization problems admit exact convex relaxations of the type \u0000<tex>$L^1+text{(nonlocal)}operatorname{TV}$</tex>\u0000, a form frequently studied in image analysis and graph-based learning. A rich geometric structure is revealed by this reformulation which in turn allows us to establish a series of properties of optimal solutions of the original problem, including the existence of minimal and maximal solutions (interpreted in a suitable sense) and the existence of regular solutions (also interpreted in a suitable sense). In addition, we highlight how the connection between adversarial training and perimeter minimization problems provides a novel, directly interpretable, statistical motivation for a family of regularized risk minimization problems involving perimeter/total variation. The majority of our theoretical results are independent of the distance used to define adversarial attacks.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50298053","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Nearly minimax-optimal rates for noisy sparse phase retrieval via early-stopped mirror descent","authors":"Fan Wu;Patrick Rebeschini","doi":"10.1093/imaiai/iaac024","DOIUrl":"https://doi.org/10.1093/imaiai/iaac024","url":null,"abstract":"This paper studies early-stopped mirror descent applied to noisy sparse phase retrieval, which is the problem of recovering a \u0000<tex>$k$</tex>\u0000-sparse signal \u0000<tex>$textbf{x}^star in{mathbb{R}}^n$</tex>\u0000 from a set of quadratic Gaussian measurements corrupted by sub-exponential noise. We consider the (non-convex) unregularized empirical risk minimization problem and show that early-stopped mirror descent, when equipped with the hypentropy mirror map and proper initialization, achieves a nearly minimax-optimal rate of convergence, provided the sample size is at least of order \u0000<tex>$k^2$</tex>\u0000 (modulo logarithmic term) and the minimum (in modulus) non-zero entry of the signal is on the order of \u0000<tex>$|textbf{x}^star |_2/sqrt{k}$</tex>\u0000. Our theory leads to a simple algorithm that does not rely on explicit regularization or thresholding steps to promote sparsity. More generally, our results establish a connection between mirror descent and sparsity in the non-convex problem of noisy sparse phase retrieval, adding to the literature on early stopping that has mostly focused on non-sparse, Euclidean and convex settings via gradient descent. Our proof combines a potential-based analysis of mirror descent with a quantitative control on a variational coherence property that we establish along the path of mirror descent, up to a prescribed stopping time.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8016800/10058586/10058608.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50297616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Perturbation bounds for (nearly) orthogonally decomposable tensors with statistical applications","authors":"Arnab Auddy;Ming Yuan","doi":"10.1093/imaiai/iaac033","DOIUrl":"https://doi.org/10.1093/imaiai/iaac033","url":null,"abstract":"We develop deterministic perturbation bounds for singular values and vectors of orthogonally decomposable tensors, in a spirit similar to classical results for matrices such as those due to Weyl, Davis, Kahan and Wedin. Our bounds demonstrate intriguing differences between matrices and higher order tensors. Most notably, they indicate that for higher order tensors perturbation affects each essential singular value/vector in isolation, and its effect on an essential singular vector does not depend on the multiplicity of its corresponding singular value or its distance from other singular values. Our results can be readily applied and provide a unified treatment to many different problems involving higher order orthogonally decomposable tensors. In particular, we illustrate the implications of our bounds through connected yet seemingly different high-dimensional data analysis tasks: the unsupervised learning scenario of tensor SVD and the supervised task of tensor regression, leading to new insights in both of these settings.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50297917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning","authors":"Gen Li;Laixi Shi;Yuxin Chen;Yuejie Chi","doi":"10.1093/imaiai/iaac034","DOIUrl":"https://doi.org/10.1093/imaiai/iaac034","url":null,"abstract":"Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic Markov decision process with \u0000<tex>$S$</tex>\u0000 states, \u0000<tex>$A$</tex>\u0000 actions and horizon length \u0000<tex>$H$</tex>\u0000, substantial progress has been achieved toward characterizing the minimax-optimal regret, which scales on the order of \u0000<tex>$sqrt{H^2SAT}$</tex>\u0000 (modulo log factors) with \u0000<tex>$T$</tex>\u0000 the total number of samples. While several competing solution paradigms have been proposed to minimize regret, they are either memory-inefficient, or fall short of optimality unless the sample size exceeds an enormous threshold (e.g. \u0000<tex>$S^6A^4 ,mathrm{poly}(H)$</tex>\u0000 for existing model-free methods).To overcome such a large sample size barrier to efficient RL, we design a novel model-free algorithm, with space complexity \u0000<tex>$O(SAH)$</tex>\u0000, that achieves near-optimal regret as soon as the sample size exceeds the order of \u0000<tex>$SA,mathrm{poly}(H)$</tex>\u0000. In terms of this sample size requirement (also referred to the initial burn-in cost), our method improves—by at least a factor of \u0000<tex>$S^5A^3$</tex>\u0000—upon any prior memory-efficient algorithm that is asymptotically regret-optimal. Leveraging the recently introduced variance reduction strategy (also called reference-advantage decomposition), the proposed algorithm employs an early-settled reference update rule, with the aid of two Q-learning sequences with upper and lower confidence bounds. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings that involve intricate exploration–exploitation trade-offs.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":null,"pages":null},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8016800/10058586/10058618.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50298054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}