CANONICAL THRESHOLDING FOR NON-SPARSE HIGH-DIMENSIONAL LINEAR REGRESSION.

IF 3.7 1区数学 Q1 STATISTICS & PROBABILITY

Annals of Statistics Pub Date : 2022-02-01 Epub Date: 2022-02-16 DOI:10.1214/21-aos2116

Igor Silin, Jianqing Fan

{"title":"CANONICAL THRESHOLDING FOR NON-SPARSE HIGH-DIMENSIONAL LINEAR REGRESSION.","authors":"Igor Silin, Jianqing Fan","doi":"10.1214/21-aos2116","DOIUrl":null,"url":null,"abstract":"We consider a high-dimensional linear regression problem. Unlike many papers on the topic, we do not require sparsity of the regression coefficients; instead, our main structural assumption is a decay of eigenvalues of the covariance matrix of the data. We propose a new family of estimators, called the canonical thresholding estimators, which pick largest regression coefficients in the canonical form. The estimators admit an explicit form and can be linked to LASSO and Principal Component Regression (PCR). A theoretical analysis for both fixed design and random design settings is provided. Obtained bounds on the mean squared error and the prediction error of a specific estimator from the family allow to clearly state sufficient conditions on the decay of eigenvalues to ensure convergence. In addition, we promote the use of the relative errors, strongly linked with the out-of-sample R 2. The study of these relative errors leads to a new concept of joint effective dimension, which incorporates the covariance of the data and the regression coefficients simultaneously, and describes the complexity of a linear regression problem. Some minimax lower bounds are established to showcase the optimality of our procedure. Numerical simulations confirm good performance of the proposed estimators compared to the previously developed methods.","PeriodicalId":8032,"journal":{"name":"Annals of Statistics","volume":" ","pages":"460-486"},"PeriodicalIF":3.7000,"publicationDate":"2022-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9491498/pdf/nihms-1782574.pdf","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Annals of Statistics","FirstCategoryId":"100","ListUrlMain":"https://doi.org/10.1214/21-aos2116","RegionNum":1,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2022/2/16 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"STATISTICS & PROBABILITY","Score":null,"Total":0}

引用次数: 2

Abstract

We consider a high-dimensional linear regression problem. Unlike many papers on the topic, we do not require sparsity of the regression coefficients; instead, our main structural assumption is a decay of eigenvalues of the covariance matrix of the data. We propose a new family of estimators, called the canonical thresholding estimators, which pick largest regression coefficients in the canonical form. The estimators admit an explicit form and can be linked to LASSO and Principal Component Regression (PCR). A theoretical analysis for both fixed design and random design settings is provided. Obtained bounds on the mean squared error and the prediction error of a specific estimator from the family allow to clearly state sufficient conditions on the decay of eigenvalues to ensure convergence. In addition, we promote the use of the relative errors, strongly linked with the out-of-sample R ². The study of these relative errors leads to a new concept of joint effective dimension, which incorporates the covariance of the data and the regression coefficients simultaneously, and describes the complexity of a linear regression problem. Some minimax lower bounds are established to showcase the optimality of our procedure. Numerical simulations confirm good performance of the proposed estimators compared to the previously developed methods.

Abstract Image

查看原文本刊更多论文

非稀疏高维线性回归的典型阈值。

我们考虑一个高维线性回归问题。与许多关于该主题的论文不同，我们不需要回归系数的稀疏性;相反，我们的主要结构假设是数据协方差矩阵的特征值的衰减。我们提出了一种新的估计量，称为规范阈值估计量，它以规范形式选择最大的回归系数。估计量采用显式形式，可以与LASSO和主成分回归(PCR)联系起来。对固定设计和随机设计进行了理论分析。得到了该族中特定估计量的均方误差和预测误差的界，从而清楚地说明了特征值衰减的充分条件以保证收敛。此外，我们提倡使用与样本外r2密切相关的相对误差。通过对这些相对误差的研究，提出了联合有效维数的概念，该概念将数据的协方差和回归系数同时考虑在内，描述了线性回归问题的复杂性。建立了一些极大极小下界来展示我们的方法的最优性。数值模拟结果表明，所提出的估计器与以前开发的方法相比具有良好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Annals of Statistics 数学-统计学与概率论

CiteScore

9.30

自引率

8.90%

发文量

119

审稿时长

6-12 weeks

期刊介绍： The Annals of Statistics aim to publish research papers of highest quality reflecting the many facets of contemporary statistics. Primary emphasis is placed on importance and originality, not on formalism. The journal aims to cover all areas of statistics, especially mathematical statistics and applied & interdisciplinary statistics. Of course many of the best papers will touch on more than one of these general areas, because the discipline of statistics has deep roots in mathematics, and in substantive scientific fields.