Annals of StatisticsPub Date : 2022-04-01Epub Date: 2022-04-07DOI: 10.1214/21-aos2131
Kuang-Yao Lee, Lexin Li
{"title":"FUNCTIONAL SUFFICIENT DIMENSION REDUCTION THROUGH AVERAGE FRÉCHET DERIVATIVES.","authors":"Kuang-Yao Lee, Lexin Li","doi":"10.1214/21-aos2131","DOIUrl":"10.1214/21-aos2131","url":null,"abstract":"<p><p>Sufficient dimension reduction (SDR) embodies a family of methods that aim for reduction of dimensionality without loss of information in a regression setting. In this article, we propose a new method for nonparametric function-on-function SDR, where both the response and the predictor are a function. We first develop the notions of functional central mean subspace and functional central subspace, which form the population targets of our functional SDR. We then introduce an average Fréchet derivative estimator, which extends the gradient of the regression function to the operator level and enables us to develop estimators for our functional dimension reduction spaces. We show the resulting functional SDR estimators are unbiased and exhaustive, and more importantly, without imposing any distributional assumptions such as the linearity or the constant variance conditions that are commonly imposed by all existing functional SDR methods. We establish the uniform convergence of the estimators for the functional dimension reduction spaces, while allowing both the number of Karhunen-Loève expansions and the intrinsic dimension to diverge with the sample size. We demonstrate the efficacy of the proposed methods through both simulations and two real data examples.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10085580/pdf/nihms-1746366.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"9320340","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
John H. J. Einmahl,Ana Ferreira,Laurens de Haan,Cláudia Neves,Chen Zhou
{"title":"Spatial dependence and space–time trend in extreme events","authors":"John H. J. Einmahl,Ana Ferreira,Laurens de Haan,Cláudia Neves,Chen Zhou","doi":"10.1214/21-aos2067","DOIUrl":"https://doi.org/10.1214/21-aos2067","url":null,"abstract":"The statistical theory of extremes is extended to observations that are non-stationary and not independent. The non-stationarity over time and space is controlled via the scedasis (tail scale) in the marginal distributions. Spatial dependence stems from multivariate extreme value theory. We establish asymptotic theory for both the weighted sequential tail empirical process and the weighted tail quantile process based on all observations, taken over time and space. The results yield two statistical tests for homoscedasticity in the tail, one in space and one in time. Further, we show that the common extreme value index can be estimated via a pseudo-maximum likelihood procedure based on pooling all (non-stationary and dependent) observations. Our leading example and application is rainfall in Northern Germany.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138531661","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annals of StatisticsPub Date : 2022-02-01Epub Date: 2022-02-16DOI: 10.1214/21-aos2117
Kin Yau Wong, Donglin Zeng, D Y Lin
{"title":"SEMIPARAMETRIC LATENT-CLASS MODELS FOR MULTIVARIATE LONGITUDINAL AND SURVIVAL DATA.","authors":"Kin Yau Wong, Donglin Zeng, D Y Lin","doi":"10.1214/21-aos2117","DOIUrl":"10.1214/21-aos2117","url":null,"abstract":"<p><p>In long-term follow-up studies, data are often collected on repeated measures of multivariate response variables as well as on time to the occurrence of a certain event. To jointly analyze such longitudinal data and survival time, we propose a general class of semiparametric latent-class models that accommodates a heterogeneous study population with flexible dependence structures between the longitudinal and survival outcomes. We combine nonparametric maximum likelihood estimation with sieve estimation and devise an efficient EM algorithm to implement the proposed approach. We establish the asymptotic properties of the proposed estimators through novel use of modern empirical process theory, sieve estimation theory, and semiparametric efficiency theory. Finally, we demonstrate the advantages of the proposed methods through extensive simulation studies and provide an application to the Atherosclerosis Risk in Communities study.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9269993/pdf/nihms-1764505.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"10155118","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annals of StatisticsPub Date : 2022-02-01Epub Date: 2022-02-16DOI: 10.1214/21-aos2116
Igor Silin, Jianqing Fan
{"title":"CANONICAL THRESHOLDING FOR NON-SPARSE HIGH-DIMENSIONAL LINEAR REGRESSION.","authors":"Igor Silin, Jianqing Fan","doi":"10.1214/21-aos2116","DOIUrl":"https://doi.org/10.1214/21-aos2116","url":null,"abstract":"<p><p>We consider a high-dimensional linear regression problem. Unlike many papers on the topic, we do not require sparsity of the regression coefficients; instead, our main structural assumption is a decay of eigenvalues of the covariance matrix of the data. We propose a new family of estimators, called the canonical thresholding estimators, which pick largest regression coefficients in the canonical form. The estimators admit an explicit form and can be linked to LASSO and Principal Component Regression (PCR). A theoretical analysis for both fixed design and random design settings is provided. Obtained bounds on the mean squared error and the prediction error of a specific estimator from the family allow to clearly state sufficient conditions on the decay of eigenvalues to ensure convergence. In addition, we promote the use of the relative errors, strongly linked with the out-of-sample <i>R</i> <sup>2</sup>. The study of these relative errors leads to a new concept of joint effective dimension, which incorporates the covariance of the data and the regression coefficients simultaneously, and describes the complexity of a linear regression problem. Some minimax lower bounds are established to showcase the optimality of our procedure. Numerical simulations confirm good performance of the proposed estimators compared to the previously developed methods.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9491498/pdf/nihms-1782574.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33478241","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Total variation regularized Fréchet regression for metric-space valued data","authors":"Zhenhua Lin,Hans-Georg Müller","doi":"10.1214/21-aos2095","DOIUrl":"https://doi.org/10.1214/21-aos2095","url":null,"abstract":"Non-Euclidean data that are indexed with a scalar predictor such as time are increasingly encountered in data applications, while statistical methodology and theory for such random objects are not well developed yet. To address the need for new methodology in this area, we develop a total variation regularization technique for nonparametric Frechet regression, which refers to a regression setting where a response residing in a generic metric space is paired with a scalar predictor and the target is a conditional Frechet mean. Specifically, we seek to approximate an unknown metric-space valued function by an estimator that minimizes the Frechet version of least squares and at the same time has small total variation, appropriately defined for metric-space valued objects. We show that the resulting estimator is representable by a piece-wise constant function and establish the minimax convergence rate of the proposed estimator for metric data objects that reside in Hadamard spaces. We illustrate the numerical performance of the proposed method for both simulated and real data, including metric spaces of symmetric positive-definite matrices with the affine-invariant distance, of probability distributions on the real line with the Wasserstein distance, and of phylogenetic trees with the Billera--Holmes--Vogtmann metric.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138508224","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annals of StatisticsPub Date : 2021-10-01Epub Date: 2021-11-12DOI: 10.1214/21-aos2066
Yuxin Chen, Jianqing Fan, Cong Ma, Yuling Yan
{"title":"BRIDGING CONVEX AND NONCONVEX OPTIMIZATION IN ROBUST PCA: NOISE, OUTLIERS, AND MISSING DATA.","authors":"Yuxin Chen, Jianqing Fan, Cong Ma, Yuling Yan","doi":"10.1214/21-aos2066","DOIUrl":"10.1214/21-aos2066","url":null,"abstract":"<p><p>This paper delivers improved theoretical guarantees for the convex programming approach in low-rank matrix estimation, in the presence of (1) random noise, (2) gross sparse outliers, and (3) missing data. This problem, often dubbed as <i>robust principal component analysis (robust PCA)</i>, finds applications in various domains. Despite the wide applicability of convex relaxation, the available statistical support (particularly the stability analysis vis-à-vis random noise) remains highly suboptimal, which we strengthen in this paper. When the unknown matrix is well-conditioned, incoherent, and of constant rank, we demonstrate that a principled convex program achieves near-optimal statistical accuracy, in terms of both the Euclidean loss and the <i>ℓ</i> <sub>∞</sub> loss. All of this happens even when nearly a constant fraction of observations are corrupted by outliers with arbitrary magnitudes. The key analysis idea lies in bridging the convex program in use and an auxiliary nonconvex optimization algorithm, and hence the title of this paper.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":3.2,"publicationDate":"2021-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9491514/pdf/nihms-1782570.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"33479290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annals of StatisticsPub Date : 2021-08-01Epub Date: 2021-09-29DOI: 10.1214/20-aos2028
Donald K K Lee, Ningyuan Chen, Hemant Ishwaran
{"title":"BOOSTED NONPARAMETRIC HAZARDS WITH TIME-DEPENDENT COVARIATES.","authors":"Donald K K Lee, Ningyuan Chen, Hemant Ishwaran","doi":"10.1214/20-aos2028","DOIUrl":"https://doi.org/10.1214/20-aos2028","url":null,"abstract":"<p><p>Given functional data from a survival process with time-dependent covariates, we derive a smooth convex representation for its nonparametric log-likelihood functional and obtain its functional gradient. From this we devise a generic gradient boosting procedure for estimating the hazard function nonparametrically. An illustrative implementation of the procedure using regression trees is described to show how to recover the unknown hazard. The generic estimator is consistent if the model is correctly specified; alternatively an oracle inequality can be demonstrated for tree-based models. To avoid overfitting, boosting employs several regularization devices. One of them is step-size restriction, but the rationale for this is somewhat mysterious from the viewpoint of consistency. Our work brings some clarity to this issue by revealing that step-size restriction is a mechanism for preventing the curvature of the risk from derailing convergence.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8691747/pdf/nihms-1683276.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39748775","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annals of StatisticsPub Date : 2021-08-01Epub Date: 2021-09-29DOI: 10.1214/20-aos2024
Lan Gao, Yingying Fan, Jinchi Lv, Qi-Man Shao
{"title":"ASYMPTOTIC DISTRIBUTIONS OF HIGH-DIMENSIONAL DISTANCE CORRELATION INFERENCE.","authors":"Lan Gao, Yingying Fan, Jinchi Lv, Qi-Man Shao","doi":"10.1214/20-aos2024","DOIUrl":"10.1214/20-aos2024","url":null,"abstract":"<p><p>Distance correlation has become an increasingly popular tool for detecting the nonlinear dependence between a pair of potentially high-dimensional random vectors. Most existing works have explored its asymptotic distributions under the null hypothesis of independence between the two random vectors when only the sample size or the dimensionality diverges. Yet its asymptotic null distribution for the more realistic setting when both sample size and dimensionality diverge in the full range remains largely underdeveloped. In this paper, we fill such a gap and develop central limit theorems and associated rates of convergence for a rescaled test statistic based on the bias-corrected distance correlation in high dimensions under some mild regularity conditions and the null hypothesis. Our new theoretical results reveal an interesting phenomenon of blessing of dimensionality for high-dimensional distance correlation inference in the sense that the accuracy of normal approximation can increase with dimensionality. Moreover, we provide a general theory on the power analysis under the alternative hypothesis of dependence, and further justify the capability of the rescaled distance correlation in capturing the pure nonlinear dependency under moderately high dimensionality for a certain type of alternative hypothesis. The theoretical results and finite-sample performance of the rescaled statistic are illustrated with several simulation examples and a blockchain application.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2021-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8491772/pdf/nihms-1684707.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39495519","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annals of StatisticsPub Date : 2021-06-01Epub Date: 2021-08-09DOI: 10.1214/20-aos1980
Jianqing Fan, Weichen Wang, Ziwei Zhu
{"title":"A SHRINKAGE PRINCIPLE FOR HEAVY-TAILED DATA: HIGH-DIMENSIONAL ROBUST LOW-RANK MATRIX RECOVERY.","authors":"Jianqing Fan, Weichen Wang, Ziwei Zhu","doi":"10.1214/20-aos1980","DOIUrl":"10.1214/20-aos1980","url":null,"abstract":"<p><p>This paper introduces a simple principle for robust statistical inference via appropriate shrinkage on the data. This widens the scope of high-dimensional techniques, reducing the distributional conditions from sub-exponential or sub-Gaussian to more relaxed bounded second or fourth moment. As an illustration of this principle, we focus on robust estimation of the low-rank matrix <b>Θ</b>* from the trace regression model <i>Y</i> = Tr(<b>Θ</b>*<sup>⊤</sup> <b>X</b>) + <i>ϵ</i>. It encompasses four popular problems: sparse linear model, compressed sensing, matrix completion and multi-task learning. We propose to apply the penalized least-squares approach to the appropriately truncated or shrunk data. Under only bounded 2+<i>δ</i> moment condition on the response, the proposed robust methodology yields an estimator that possesses the same statistical error rates as previous literature with sub-Gaussian errors. For sparse linear model and multi-task regression, we further allow the design to have only bounded fourth moment and obtain the same statistical rates. As a byproduct, we give a robust covariance estimator with concentration inequality and optimal rate of convergence in terms of the spectral norm, when the samples only bear bounded fourth moment. This result is of its own interest and importance. We reveal that under high dimensions, the sample covariance matrix is not optimal whereas our proposed robust covariance can achieve optimality. Extensive simulations are carried out to support the theories.</p>","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2021-06-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8457508/pdf/nihms-1639579.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"39443652","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Survival analysis via hierarchically dependent mixture hazards","authors":"F. Camerlenghi, A. Lijoi, I. Pruenster","doi":"10.1214/20-AOS1982","DOIUrl":"https://doi.org/10.1214/20-AOS1982","url":null,"abstract":"Hierarchical nonparametric processes are popular tools for defining priors on collections of probability distributions, which induce dependence across multiple samples. In survival analysis problems one is typically interested in modeling the hazard rates, rather than the probability distributions themselves, and the currently available methodologies are not applicable. Here we fill this gap by introducing a novel, and analytically tractable, class of multivariate mixtures whose distribution acts as a prior for the vector of sample–specific baseline hazard rates. The dependence is induced through a hierarchical specification for the mixing random measures that ultimately corresponds to a composition of random discrete combinatorial structures. Our theoretical results allow to develop a full Bayesian analysis for this class of models, which can also account for right–censored survival data and covariates, and we also show posterior consistency. In particular, we emphasize that the posterior characterization we achieve is the key for devising both marginal and conditional algorithms for evaluating Bayesian inferences of interest. The effectiveness of our proposal is illustrated through some synthetic and real data examples.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":null,"pages":null},"PeriodicalIF":4.5,"publicationDate":"2021-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49201957","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}