{"title":"Nearly minimax-optimal rates for noisy sparse phase retrieval via early-stopped mirror descent","authors":"Fan Wu;Patrick Rebeschini","doi":"10.1093/imaiai/iaac024","DOIUrl":"https://doi.org/10.1093/imaiai/iaac024","url":null,"abstract":"This paper studies early-stopped mirror descent applied to noisy sparse phase retrieval, which is the problem of recovering a \u0000<tex>$k$</tex>\u0000-sparse signal \u0000<tex>$textbf{x}^star in{mathbb{R}}^n$</tex>\u0000 from a set of quadratic Gaussian measurements corrupted by sub-exponential noise. We consider the (non-convex) unregularized empirical risk minimization problem and show that early-stopped mirror descent, when equipped with the hypentropy mirror map and proper initialization, achieves a nearly minimax-optimal rate of convergence, provided the sample size is at least of order \u0000<tex>$k^2$</tex>\u0000 (modulo logarithmic term) and the minimum (in modulus) non-zero entry of the signal is on the order of \u0000<tex>$|textbf{x}^star |_2/sqrt{k}$</tex>\u0000. Our theory leads to a simple algorithm that does not rely on explicit regularization or thresholding steps to promote sparsity. More generally, our results establish a connection between mirror descent and sparsity in the non-convex problem of noisy sparse phase retrieval, adding to the literature on early stopping that has mostly focused on non-sparse, Euclidean and convex settings via gradient descent. Our proof combines a potential-based analysis of mirror descent with a quantitative control on a variational coherence property that we establish along the path of mirror descent, up to a prescribed stopping time.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"12 2","pages":"633-713"},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8016800/10058586/10058608.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50297616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Perturbation bounds for (nearly) orthogonally decomposable tensors with statistical applications","authors":"Arnab Auddy;Ming Yuan","doi":"10.1093/imaiai/iaac033","DOIUrl":"https://doi.org/10.1093/imaiai/iaac033","url":null,"abstract":"We develop deterministic perturbation bounds for singular values and vectors of orthogonally decomposable tensors, in a spirit similar to classical results for matrices such as those due to Weyl, Davis, Kahan and Wedin. Our bounds demonstrate intriguing differences between matrices and higher order tensors. Most notably, they indicate that for higher order tensors perturbation affects each essential singular value/vector in isolation, and its effect on an essential singular vector does not depend on the multiplicity of its corresponding singular value or its distance from other singular values. Our results can be readily applied and provide a unified treatment to many different problems involving higher order orthogonally decomposable tensors. In particular, we illustrate the implications of our bounds through connected yet seemingly different high-dimensional data analysis tasks: the unsupervised learning scenario of tensor SVD and the supervised task of tensor regression, leading to new insights in both of these settings.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"12 2","pages":"1044-1072"},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50297917","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Breaking the sample complexity barrier to regret-optimal model-free reinforcement learning","authors":"Gen Li;Laixi Shi;Yuxin Chen;Yuejie Chi","doi":"10.1093/imaiai/iaac034","DOIUrl":"https://doi.org/10.1093/imaiai/iaac034","url":null,"abstract":"Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic Markov decision process with \u0000<tex>$S$</tex>\u0000 states, \u0000<tex>$A$</tex>\u0000 actions and horizon length \u0000<tex>$H$</tex>\u0000, substantial progress has been achieved toward characterizing the minimax-optimal regret, which scales on the order of \u0000<tex>$sqrt{H^2SAT}$</tex>\u0000 (modulo log factors) with \u0000<tex>$T$</tex>\u0000 the total number of samples. While several competing solution paradigms have been proposed to minimize regret, they are either memory-inefficient, or fall short of optimality unless the sample size exceeds an enormous threshold (e.g. \u0000<tex>$S^6A^4 ,mathrm{poly}(H)$</tex>\u0000 for existing model-free methods).To overcome such a large sample size barrier to efficient RL, we design a novel model-free algorithm, with space complexity \u0000<tex>$O(SAH)$</tex>\u0000, that achieves near-optimal regret as soon as the sample size exceeds the order of \u0000<tex>$SA,mathrm{poly}(H)$</tex>\u0000. In terms of this sample size requirement (also referred to the initial burn-in cost), our method improves—by at least a factor of \u0000<tex>$S^5A^3$</tex>\u0000—upon any prior memory-efficient algorithm that is asymptotically regret-optimal. Leveraging the recently introduced variance reduction strategy (also called reference-advantage decomposition), the proposed algorithm employs an early-settled reference update rule, with the aid of two Q-learning sequences with upper and lower confidence bounds. The design principle of our early-settled variance reduction method might be of independent interest to other RL settings that involve intricate exploration–exploitation trade-offs.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"12 2","pages":"969-1043"},"PeriodicalIF":1.6,"publicationDate":"2022-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://ieeexplore.ieee.org/iel7/8016800/10058586/10058618.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"50298054","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Dynamic ranking and translation synchronization","authors":"E. Araya, Eglantine Karl'e, Hemant Tyagi","doi":"10.1093/imaiai/iaad029","DOIUrl":"https://doi.org/10.1093/imaiai/iaad029","url":null,"abstract":"\u0000 In many applications, such as sport tournaments or recommendation systems, we have at our disposal data consisting of pairwise comparisons between a set of $n$ items (or players). The objective is to use these data to infer the latent strength of each item and/or their ranking. Existing results for this problem predominantly focus on the setting consisting of a single comparison graph $G$. However, there exist scenarios (e.g. sports tournaments) where the pairwise comparison data evolve with time. Theoretical results for this dynamic setting are relatively limited, and are the focus of this paper. We study an extension of the translation synchronization problem, to the dynamic setting. In this set-up, we are given a sequence of comparison graphs $(G_t)_{tin{{mathscr{T}}}}$, where $ {{mathscr{T}}} subset [0,1]$ is a grid representing the time domain, and for each item $i$ and time $tin{{mathscr{T}}}$ there is an associated unknown strength parameter $z^*_{t,i}in{{mathbb{R}}}$. We aim to recover, for $tin{{mathscr{T}}}$, the strength vector $z^*_t=(z^*_{t,1},dots ,z^*_{t,n})$ from noisy measurements of $z^*_{t,i}-z^*_{t,j}$, where $left {{i,j}right }$ is an edge in $G_t$. Assuming that $z^*_t$ evolves smoothly in $t$, we propose two estimators—one based on a smoothness-penalized least squares approach and the other based on projection onto the low-frequency eigenspace of a suitable smoothness operator. For both estimators, we provide finite sample bounds for the $ell _2$ estimation error under the assumption that $G_t$ is connected for all $tin{{mathscr{T}}}$, thus proving the consistency of the proposed methods in terms of the grid size $|mathscr{T}|$. We complement our theoretical findings with experiments on synthetic and real data.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"8 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-07-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"82565710","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tamir Bendory, Ariel Jaffe, William Leeb, Nir Sharon, Amit Singer
{"title":"Super-resolution multi-reference alignment.","authors":"Tamir Bendory, Ariel Jaffe, William Leeb, Nir Sharon, Amit Singer","doi":"10.1093/imaiai/iaab003","DOIUrl":"10.1093/imaiai/iaab003","url":null,"abstract":"<p><p>We study super-resolution multi-reference alignment, the problem of estimating a signal from many circularly shifted, down-sampled and noisy observations. We focus on the low SNR regime, and show that a signal in <math> <mrow><msup><mi>ℝ</mi> <mi>M</mi></msup> </mrow> </math> is uniquely determined when the number <i>L</i> of samples per observation is of the order of the square root of the signal's length ( <math><mrow><mi>L</mi> <mo>=</mo> <mi>O</mi> <mo>(</mo> <msqrt><mi>M</mi></msqrt> <mo>)</mo></mrow> </math> ). Phrased more informally, one can square the resolution. This result holds if the number of observations is proportional to 1/SNR<sup>3</sup>. In contrast, with fewer observations recovery is impossible even when the observations are not down-sampled (<i>L</i> = <i>M</i>). The analysis combines tools from statistical signal processing and invariant theory. We design an expectation-maximization algorithm and demonstrate that it can super-resolve the signal in challenging SNR regimes.</p>","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"11 2","pages":"533-555"},"PeriodicalIF":1.4,"publicationDate":"2022-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9374099/pdf/nihms-1776575.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"40708781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Minimax optimal clustering of bipartite graphs with a generalized power method","authors":"Guillaume Braun, Hemant Tyagi","doi":"10.1093/imaiai/iaad006","DOIUrl":"https://doi.org/10.1093/imaiai/iaad006","url":null,"abstract":"\u0000 Clustering bipartite graphs is a fundamental task in network analysis. In the high-dimensional regime where the number of rows $n_{1}$ and the number of columns $n_{2}$ of the associated adjacency matrix are of different order, the existing methods derived from the ones used for symmetric graphs can come with sub-optimal guarantees. Due to increasing number of applications for bipartite graphs in the high-dimensional regime, it is of fundamental importance to design optimal algorithms for this setting. The recent work of Ndaoud et al. (2022, IEEE Trans. Inf. Theory, 68, 1960–1975) improves the existing upper-bound for the misclustering rate in the special case where the columns (resp. rows) can be partitioned into $L = 2$ (resp. $K = 2$) communities. Unfortunately, their algorithm cannot be extended to the more general setting where $K neq L geq 2$. We overcome this limitation by introducing a new algorithm based on the power method. We derive conditions for exact recovery in the general setting where $K neq L geq 2$, and show that it recovers the result in Ndaoud et al. (2022, IEEE Trans. Inf. Theory, 68, 1960–1975). We also derive a minimax lower bound on the misclustering error when $K=L$ under a symmetric version of our model, which matches the corresponding upper bound up to a factor depending on $K$.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"57 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"74089679","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"An analysis of classical multidimensional scaling with applications to clustering.","authors":"Anna Little, Yuying Xie, Qiang Sun","doi":"10.1093/imaiai/iaac004","DOIUrl":"10.1093/imaiai/iaac004","url":null,"abstract":"<p><p>Classical multidimensional scaling is a widely used dimension reduction technique. Yet few theoretical results characterizing its statistical performance exist. This paper provides a theoretical framework for analyzing the quality of embedded samples produced by classical multidimensional scaling. This lays a foundation for various downstream statistical analyses, and we focus on clustering noisy data. Our results provide scaling conditions on the signal-to-noise ratio under which classical multidimensional scaling followed by a distance-based clustering algorithm can recover the cluster labels of all samples. Simulation studies confirm these scaling conditions are sharp. Applications to the cancer gene-expression data, the single-cell RNA sequencing data and the natural language data lend strong support to the methodology and theory.</p>","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"12 1","pages":"72-112"},"PeriodicalIF":1.6,"publicationDate":"2022-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9893760/pdf/iaac004.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9392159","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Linear convergence of the subspace constrained mean shift algorithm: from Euclidean to directional data.","authors":"Yikun Zhang, Yen-Chi Chen","doi":"10.1093/imaiai/iaac005","DOIUrl":"10.1093/imaiai/iaac005","url":null,"abstract":"<p><p>This paper studies the linear convergence of the subspace constrained mean shift (SCMS) algorithm, a well-known algorithm for identifying a density ridge defined by a kernel density estimator. By arguing that the SCMS algorithm is a special variant of a subspace constrained gradient ascent (SCGA) algorithm with an adaptive step size, we derive the linear convergence of such SCGA algorithm. While the existing research focuses mainly on density ridges in the Euclidean space, we generalize density ridges and the SCMS algorithm to directional data. In particular, we establish the stability theorem of density ridges with directional data and prove the linear convergence of our proposed directional SCMS algorithm.</p>","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"12 1","pages":"210-311"},"PeriodicalIF":1.4,"publicationDate":"2022-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9893762/pdf/iaac005.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9316422","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Zero-truncated Poisson regression for sparse multiway count data corrupted by false zeros","authors":"Oscar L'opez, Daniel M. Dunlavy, R. Lehoucq","doi":"10.1093/imaiai/iaad016","DOIUrl":"https://doi.org/10.1093/imaiai/iaad016","url":null,"abstract":"\u0000 We propose a novel statistical inference methodology for multiway count data that is corrupted by false zeros that are indistinguishable from true zero counts. Our approach consists of zero-truncating the Poisson distribution to neglect all zero values. This simple truncated approach dispenses with the need to distinguish between true and false zero counts and reduces the amount of data to be processed. Inference is accomplished via tensor completion that imposes low-rank tensor structure on the Poisson parameter space. Our main result shows that an $N$-way rank-$R$ parametric tensor $boldsymbol{mathscr{M}}in (0,infty )^{Itimes cdots times I}$ generating Poisson observations can be accurately estimated by zero-truncated Poisson regression from approximately $IR^2log _2^2(I)$ non-zero counts under the nonnegative canonical polyadic decomposition. Our result also quantifies the error made by zero-truncating the Poisson distribution when the parameter is uniformly bounded from below. Therefore, under a low-rank multiparameter model, we propose an implementable approach guaranteed to achieve accurate regression in under-determined scenarios with substantial corruption by false zeros. Several numerical experiments are presented to explore the theoretical results.","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"14 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-01-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"89460693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"OUP accepted manuscript","authors":"","doi":"10.1093/imaiai/iaac007","DOIUrl":"https://doi.org/10.1093/imaiai/iaac007","url":null,"abstract":"","PeriodicalId":45437,"journal":{"name":"Information and Inference-A Journal of the Ima","volume":"140 1","pages":""},"PeriodicalIF":1.6,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"87576118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}