Annals of Statistics最新文献_第2页

Relaxing the i.i.d. assumption: Adaptively minimax optimal regret via root-entropic regularization 放宽i.i.d假设:基于根熵正则化的自适应最小最大最优后悔

1区数学

Annals of Statistics Pub Date : 2023-08-01 DOI: 10.1214/23-aos2315

Blair Bilodeau, Jeffrey Negrea, Daniel M. Roy

{"title":"Relaxing the i.i.d. assumption: Adaptively minimax optimal regret via root-entropic regularization","authors":"Blair Bilodeau, Jeffrey Negrea, Daniel M. Roy","doi":"10.1214/23-aos2315","DOIUrl":"https://doi.org/10.1214/23-aos2315","url":null,"abstract":"We consider prediction with expert advice when data are generated from distributions varying arbitrarily within an unknown constraint set. This semi-adversarial setting includes (at the extremes) the classical i.i.d. setting, when the unknown constraint set is restricted to be a singleton, and the unconstrained adversarial setting, when the constraint set is the set of all distributions. The Hedge algorithm—long known to be minimax (rate) optimal in the adversarial regime—was recently shown to be simultaneously minimax optimal for i.i.d. data. In this work, we propose to relax the i.i.d. assumption by seeking adaptivity at all levels of a natural ordering on constraint sets. We provide matching upper and lower bounds on the minimax regret at all levels, show that Hedge with deterministic learning rates is suboptimal outside of the extremes and prove that one can adaptively obtain minimax regret at all levels. We achieve this optimal adaptivity using the follow-the-regularized-leader (FTRL) framework, with a novel adaptive regularization scheme that implicitly scales as the square root of the entropy of the current predictive distribution, rather than the entropy of the initial predictive distribution. Finally, we provide novel technical tools to study the statistical performance of FTRL along the semi-adversarial spectrum.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"1 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135165186","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 7

Graphical models for nonstationary time series 非平稳时间序列的图形模型

1区数学

Annals of Statistics Pub Date : 2023-08-01 DOI: 10.1214/22-aos2205

Sumanta Basu, Suhasini Subba Rao

{"title":"Graphical models for nonstationary time series","authors":"Sumanta Basu, Suhasini Subba Rao","doi":"10.1214/22-aos2205","DOIUrl":"https://doi.org/10.1214/22-aos2205","url":null,"abstract":"We propose NonStGM, a general nonparametric graphical modeling framework, for studying dynamic associations among the components of a nonstationary multivariate time series. It builds on the framework of Gaussian graphical models (GGM) and stationary time series graphical models (StGM) and complements existing works on parametric graphical models based on change point vector autoregressions (VAR). Analogous to StGM, the proposed framework captures conditional noncorrelations (both intertemporal and contemporaneous) in the form of an undirected graph. In addition, to describe the more nuanced nonstationary relationships among the components of the time series, we introduce the new notion of conditional nonstationarity/stationarity and incorporate it within the graph. This can be used to search for small subnetworks that serve as the “source” of nonstationarity in a large system. We explicitly connect conditional noncorrelation and stationarity between and within components of the multivariate time series to zero and Toeplitz embeddings of an infinite-dimensional inverse covariance operator. In the Fourier domain, conditional stationarity and noncorrelation relationships in the inverse covariance operator are encoded with a specific sparsity structure of its integral kernel operator. We show that these sparsity patterns can be recovered from finite-length time series by nodewise regression of discrete Fourier transforms (DFT) across different Fourier frequencies. We demonstrate the feasibility of learning NonStGM structure from data using simulation studies.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"60 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134951510","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 2

Learning low-dimensional nonlinear structures from high-dimensional noisy data: An integral operator approach 从高维噪声数据中学习低维非线性结构:一种积分算子方法

1区数学

Annals of Statistics Pub Date : 2023-08-01 DOI: 10.1214/23-aos2306

Xiucai Ding, Rong Ma

{"title":"Learning low-dimensional nonlinear structures from high-dimensional noisy data: An integral operator approach","authors":"Xiucai Ding, Rong Ma","doi":"10.1214/23-aos2306","DOIUrl":"https://doi.org/10.1214/23-aos2306","url":null,"abstract":"We propose a kernel-spectral embedding algorithm for learning low-dimensional nonlinear structures from noisy and high-dimensional observations, where the data sets are assumed to be sampled from a nonlinear manifold model and corrupted by high-dimensional noise. The algorithm employs an adaptive bandwidth selection procedure which does not rely on prior knowledge of the underlying manifold. The obtained low-dimensional embeddings can be further utilized for downstream purposes such as data visualization, clustering and prediction. Our method is theoretically justified and practically interpretable. Specifically, for a general class of kernel functions, we establish the convergence of the final embeddings to their noiseless counterparts when the dimension grows polynomially with the size, and characterize the effect of the signal-to-noise ratio on the rate of convergence and phase transition. We also prove the convergence of the embeddings to the eigenfunctions of an integral operator defined by the kernel map of some reproducing kernel Hilbert space capturing the underlying nonlinear structures. Our results hold even when the dimension of the manifold grows with the sample size. Numerical simulations and analysis of real data sets show the superior empirical performance of the proposed method, compared to many existing methods, on learning various nonlinear manifolds in diverse applications.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"18 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135055287","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Universality of regularized regression estimators in high dimensions 高维正则回归估计量的通用性

1区数学

Annals of Statistics Pub Date : 2023-08-01 DOI: 10.1214/23-aos2309

Qiyang Han, Yandi Shen

{"title":"Universality of regularized regression estimators in high dimensions","authors":"Qiyang Han, Yandi Shen","doi":"10.1214/23-aos2309","DOIUrl":"https://doi.org/10.1214/23-aos2309","url":null,"abstract":"The Convex Gaussian Min–Max Theorem (CGMT) has emerged as a prominent theoretical tool for analyzing the precise stochastic behavior of various statistical estimators in the so-called high-dimensional proportional regime, where the sample size and the signal dimension are of the same order. However, a well-recognized limitation of the existing CGMT machinery rests in its stringent requirement on the exact Gaussianity of the design matrix, therefore rendering the obtained precise high-dimensional asymptotics, largely a specific Gaussian theory in various important statistical models. This paper provides a structural universality framework for a broad class of regularized regression estimators that is particularly compatible with the CGMT machinery. Here, universality means that if a “structure” is satisfied by the regression estimator μˆG for a standard Gaussian design G, then it will also be satisfied by μˆA for a general non-Gaussian design A with independent entries. In particular, we show that with a good enough ℓ∞ bound for the regression estimator μˆA, any “structural property” that can be detected via the CGMT for μˆG also holds for μˆA under a general design A with independent entries. As a proof of concept, we demonstrate our new universality framework in three key examples of regularized regression estimators: the Ridge, Lasso and regularized robust regression estimators, where new universality properties of risk asymptotics and/or distributions of regression estimators and other related quantities are proved. As a major statistical implication of the Lasso universality results, we validate inference procedures using the degrees-of-freedom adjusted debiased Lasso under general design and error distributions. We also provide a counterexample, showing that universality properties for regularized regression estimators do not extend to general isotropic designs. The proof of our universality results relies on new comparison inequalities for the optimum of a broad class of cost functions and Gordon’s max–min (or min–max) costs, over arbitrary structure sets subject to ℓ∞ constraints. These results may be of independent interest and broader applicability.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135055619","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Noisy linear inverse problems under convex constraints: Exact risk asymptotics in high dimensions 凸约束下的噪声线性逆问题:高维的精确风险渐近性

1区数学

Annals of Statistics Pub Date : 2023-08-01 DOI: 10.1214/23-aos2301

Qiyang Han

{"title":"Noisy linear inverse problems under convex constraints: Exact risk asymptotics in high dimensions","authors":"Qiyang Han","doi":"10.1214/23-aos2301","DOIUrl":"https://doi.org/10.1214/23-aos2301","url":null,"abstract":"In the standard Gaussian linear measurement model Y=Xμ0+ξ∈Rm with a fixed noise level σ>0, we consider the problem of estimating the unknown signal μ0 under a convex constraint μ0∈K, where K is a closed convex set in Rn. We show that the risk of the natural convex constrained least squares estimator (LSE) μˆ(σ) can be characterized exactly in high-dimensional limits, by that of the convex constrained LSE μˆKseq in the corresponding Gaussian sequence model at a different noise level. Formally, we show that ‖μˆ(σ)−μ0‖2/(nrn2)→1in probability, where rn 2>0 solves the fixed-point equation E‖μˆKseq( (rn2+σ2)/(m/n))−μ0‖2=nrn2. This characterization holds (uniformly) for risks rn2 in the maximal regime that ranges from constant order all the way down to essentially the parametric rate, as long as certain necessary nondegeneracy condition is satisfied for μˆ(σ). The precise risk characterization reveals a fundamental difference between noiseless (or low noise limit) and noisy linear inverse problems in terms of the sample complexity for signal recovery. A concrete example is given by the isotonic regression problem: While exact recovery of a general monotone signal requires m≫n1/3 samples in the noiseless setting, consistent signal recovery in the noisy setting requires as few as m≫logn samples. Such a discrepancy occurs when the low and high noise risk behavior of μˆKseq differ significantly. In statistical languages, this occurs when μˆKseq estimates 0 at a faster “adaptation rate” than the slower “worst-case rate” for general signals. Several other examples, including nonnegative least squares and generalized Lasso (in constrained forms), are also worked out to demonstrate the concrete applicability of the theory in problems of different types. The proof relies on a collection of new analytic and probabilistic results concerning estimation error, log likelihood ratio test statistics and degree-of-freedom associated with μˆKseq, regarded as stochastic processes indexed by the noise level. These results are of independent interest in and of themselves.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"138 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135055877","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 3

Single index Fréchet regression 单指数回归法

1区数学

Annals of Statistics Pub Date : 2023-08-01 DOI: 10.1214/23-aos2307

Satarupa Bhattacharjee, Hans-Georg Müller

{"title":"Single index Fréchet regression","authors":"Satarupa Bhattacharjee, Hans-Georg Müller","doi":"10.1214/23-aos2307","DOIUrl":"https://doi.org/10.1214/23-aos2307","url":null,"abstract":"Single index models provide an effective dimension reduction tool in regression, especially for high-dimensional data, by projecting a general multivariate predictor onto a direction vector. We propose a novel single-index model for regression models where metric space-valued random object responses are coupled with multivariate Euclidean predictors. The responses in this regression model include complex, non-Euclidean data, including covariance matrices, graph Laplacians of networks and univariate probability distribution functions, among other complex objects that lie in abstract metric spaces. While Fréchet regression has proved useful for modeling the conditional mean of such random objects given multivariate Euclidean vectors, it does not provide for regression parameters such as slopes or intercepts, since the metric space-valued responses are not amenable to linear operations. As a consequence, distributional results for Fréchet regression have been elusive. We show here that for the case of multivariate Euclidean predictors, the parameters that define a single index and projection vector can be used to substitute for the inherent absence of parameters in Fréchet regression. Specifically, we derive the asymptotic distribution of suitable estimates of these parameters, which then can be utilized to test linear hypotheses for the parameters, subject to an identifiability condition. Consistent estimation of the link function of the single index Fréchet regression model is obtained through local linear Fréchet regression. We demonstrate the finite sample performance of estimation and inference for the proposed single index Fréchet regression model through simulation studies, including the special cases where responses are probability distributions and graph adjacency matrices. The method is illustrated for resting-state functional Magnetic Resonance Imaging (fMRI) data from the ADNI study.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"8 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134951505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 6

Optimal change-point detection and localization 最优变点检测和定位

1区数学

Annals of Statistics Pub Date : 2023-08-01 DOI: 10.1214/23-aos2297

Nicolas Verzelen, Magalie Fromont, Matthieu Lerasle, Patricia Reynaud-Bouret

{"title":"Optimal change-point detection and localization","authors":"Nicolas Verzelen, Magalie Fromont, Matthieu Lerasle, Patricia Reynaud-Bouret","doi":"10.1214/23-aos2297","DOIUrl":"https://doi.org/10.1214/23-aos2297","url":null,"abstract":"Given a times series Y in Rn, with a piecewise constant mean and independent components, the twin problems of change-point detection and change-point localization, respectively amount to detecting the existence of times where the mean varies and estimating the positions of those change-points. In this work, we tightly characterize optimal rates for both problems and uncover the phase transition phenomenon from a global testing problem to a local estimation problem. Introducing a suitable definition of the energy of a change-point, we first establish in the single change-point setting that the optimal detection threshold is 2loglog(n). When the energy is just above the detection threshold, then the problem of localizing the change-point becomes purely parametric: it only depends on the difference in means and not on the position of the change-point anymore. Interestingly, for most change-point positions, including all those away from the endpoints of the time series, it is possible to detect and localize them at a much smaller energy level. In the multiple change-point setting, we establish the energy detection threshold and show similarly that the optimal localization error of a specific change-point becomes purely parametric. Along the way, tight minimax rates for Hausdorff and l 1 estimation losses of the vector of all change-points positions are also established. Two procedures achieving these optimal rates are introduced. The first one is a least-squares estimator with a new multiscale penalty that favours well spread change-points. The second one is a two-step multiscale post-processing procedure whose computational complexity can be as low as O(nlog(n)). Notably, these two procedures accommodate with the presence of possibly many low-energy and therefore undetectable change-points and are still able to detect and localize high-energy change-points even with the presence of those nuisance parameters.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"98 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"135065833","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 31

Bootstrapping persistent Betti numbers and other stabilizing statistics 引导持久的贝蒂数字和其他稳定的统计数据

1区数学

Annals of Statistics Pub Date : 2023-08-01 DOI: 10.1214/23-aos2277

Benjamin Roycraft, Johannes Krebs, Wolfgang Polonik

引用次数: 3

On lower bounds for the bias-variance trade-off 偏差-方差权衡的下界

1区数学

Annals of Statistics Pub Date : 2023-08-01 DOI: 10.1214/23-aos2279

Alexis Derumigny, Johannes Schmidt-Hieber

{"title":"On lower bounds for the bias-variance trade-off","authors":"Alexis Derumigny, Johannes Schmidt-Hieber","doi":"10.1214/23-aos2279","DOIUrl":"https://doi.org/10.1214/23-aos2279","url":null,"abstract":"It is a common phenomenon that for high-dimensional and nonparametric statistical models, rate-optimal estimators balance squared bias and variance. Although this balancing is widely observed, little is known whether methods exist that could avoid the trade-off between bias and variance. We propose a general strategy to obtain lower bounds on the variance of any estimator with bias smaller than a prespecified bound. This shows to which extent the bias-variance trade-off is unavoidable and allows to quantify the loss of performance for methods that do not obey it. The approach is based on a number of abstract lower bounds for the variance involving the change of expectation with respect to different probability measures as well as information measures such as the Kullback-Leibler or chi-square divergence. Some of these inequalities rely on a new concept of information matrices. In a second part of the article, the abstract lower bounds are applied to several statistical models including the Gaussian white noise model, a boundary estimation problem, the Gaussian sequence model and the high-dimensional linear regression model. For these specific statistical applications, different types of bias-variance trade-offs occur that vary considerably in their strength. For the trade-off between integrated squared bias and integrated variance in the Gaussian white noise model, we propose to combine the general strategy for lower bounds with a reduction technique. This allows us to reduce the original problem to a lower bound on the bias-variance trade-off for estimators with additional symmetry properties in a simpler statistical model. To highlight possible extensions of the proposed framework, we moreover briefly discuss the trade-off between bias and mean absolute deviation.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":"30 1","pages":"0"},"PeriodicalIF":0.0,"publicationDate":"2023-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"134951950","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 5

Off-policy evaluation in partially observed Markov decision processes under sequential ignorability 序列可忽略性条件下部分观察马尔可夫决策过程的偏离策略评价

1区数学

Annals of Statistics Pub Date : 2023-08-01 DOI: 10.1214/23-aos2287

Yuchen Hu, Stefan Wager

引用次数: 11